In this vignette, we describe the general workflow for
sufficient-component cause (SCC) model creation with
epicmodel
. But first of all, if you haven’t done so, read
Rothman’s paper (Rothman 1976)!
The steplist is the input from which the SCC model
is created. The steplist contains many steps that together form the
mechanisms of outcome occurrence. These steps are basically
IF/THEN-statements and can be chained together to form mechanisms, e.g.,
IF cell A produces cytokine X THEN cell B produces cytokine Y =>
IF cell B produces cytokine Y THEN cell C produces cytokine Z. To
learn why SCC models are based on these steps and how their exact
structure looks like, see vignette("steplist")
.
For the purpose of this introduction, it is enough to know that steps
need a very specific structure in order to enable automatic SCC model
creation. epicmodel
therefore provides the Steplist
Creator Shiny
App, which helps creating a steplist
that has the necessary structure. You can launch the shiny
app with:
Throughout this introduction, we will use the built in
steplist_rain
.
steplist_rain
is a dummy example and describes
weather-induced ways of getting wet. Let’s first inspect the steplist
using print()
.
print(steplist)
#> ✖ unchecked (please run `check_steplist()` before continuing)
#> WHAT: 7 WHAT segments
#> DOES: 6 DOES segments
#> WHERE: 3 WHERE segments
#> MODULE: 3 modules
#> STEP: 10 STEPs
#> ICC: 0 incompatible component-cause pairs
#> OUTCOME: 1 outcome definition
Print tells us what the steplist contains. Again, see
vignette("steplist")
to find out what all these different
elements are.
You can see two important things. First, the steplist contains only
10 steps, which shows that it’s probably not a realistic example.
Second, the first line tells you that the steplist is “unchecked” and
asks you to run check_steplist()
.
check_steplist()
ensures that the steplist’s structure is
suitable for SCC model creation.
steplist_checked <- check_steplist(steplist)
#>
#> ── Checking epicmodel_steplist steplist ────────────────────────────────────────
#> ✔ Checking WHAT IDs was successful.
#> ✔ Checking DOES IDs was successful.
#> ✔ Checking WHERE IDs was successful.
#> ✔ Checking Module IDs was successful.
#> ✔ Checking ICC IDs was successful.
#> ✔ Checking WHAT keywords was successful.
#> ✔ Checking DOES keywords was successful.
#> ✔ Checking WHERE keywords was successful.
#> ✔ Checking Module keywords was successful.
#> ✔ Checking Modules was successful.
#> ✔ Checking ICC entries was successful.
#> ✔ Checking WHAT segments was successful.
#> ✔ Checking DOES segments was successful.
#> ✔ Checking WHERE segments was successful.
#> ✔ Checking references was successful.
#> ✔ Checking start/end steps was successful.
#> ✔ Checking THEN statements was successful.
#> ✔ Checking THEN/IF/IFNOT equality was successful.
#> ✔ Checking outcome definitions was successful.
#> ── Summary ─────────────────────────────────────────────────────────────────────
#> ✔ Checking successful!
You can see from the output that check_steplist()
conducts many checks. See the function documentation for a detailed
description. In our case, all checks have been successful and therefore
our steplist is now “checked”. When running print()
again,
you will see a confirmation.
print(steplist_checked)
#> ✔ checked successfully
#> WHAT: 7 WHAT segments
#> DOES: 6 DOES segments
#> WHERE: 3 WHERE segments
#> MODULE: 3 modules
#> STEP: 10 STEPs
#> ICC: 0 incompatible component-cause pairs
#> OUTCOME: 1 outcome definition
With a checked steplist, you can also use summary()
and
plot()
. Let’s start with summary()
:
summary(steplist_checked)
#>
#> ── Outcome Definitions ──
#>
#> • you get wet
#>
#> ── Component causes ──
#>
#> • no vacation
#> • weekday
#> • rain
#> • get groceries
#>
#> ── Interventions ──
#>
#> • take vacation
#> • take umbrella
#> • work from home
#>
#> ── End steps ──
#>
#> • you get wet
#>
#> ── Other steps ──
#>
#> • walk to work
#> • go outside
The output contains the outcome definition as well as all the steps
split into different types: Component causes, Interventions, End steps,
and Other steps. vignette("steplist")
will tell you more
about these different types of steps. Now, let’s try
plot()
.
#> Label Module Step
#> 1 CC1 activity no vacation
#> 2 CC2 fate weekday
#> 3 CC3 weather rain
#> 4 CC4 activity get groceries
#> 5 I1 activity take vacation
#> 6 I2 activity take umbrella
#> 7 I3 activity work from home
#> 8 S1 activity walk to work
#> 9 S2 activity go outside
#> 10 E1 weather you get wet
plot()
shows you how your steps are chained together.
The graph is created using the DiagrammeR
package. The
legend shown below the graph links the node labels to your steps and
tells you to which module a step belongs. Steps that
belong together can be grouped into modules when creating the steplist.
The node colors in the graph, e.g., depend on the step’s module. See
vignette("modules")
to learn more. Later, we will see
similar mechanisms for every sufficient cause.
But first, we need to actually create the SCC model. This is done by
function create_scc()
. See the function documentation to
learn more about the applied algorithm. It only requires a single input:
the checked steplist.
scc_model <- create_scc(steplist_checked)
#>
#> ── Create SCC Model ──
#>
#> ✔ 15/15 | Check if set of component causes is sufficient
#> ✔ 5/5 | Check if sufficiency dependends on IFNOT conditions
#> ✔ 5/5 | Check if sufficient cause is minimal
#> ℹ 2/5 sufficient causes are minimal
As our steplist is very small, SCC model creation is fast. However,
with realistic steplists, create_scc()
can take some time
because it evaluates every possible combination of component causes.
From the output we see that 15 sets of component causes have been
checked. When running summary(steplist_checked)
before, we
saw that there are 4 component causes. Because every one of them can be
present or absent, we have 2 ^ 4 = 16
combinations. Of
course, there’s no need to evaluate the set where all component causes
are absent, which leaves us with 15 combinations. We can further see
that 5 of these 15 combinations were sufficient, i.e., fulfilled the
outcome definition. We ignore the “Check if sufficiency depends on IFNOT
conditions” for this introduction, but you can learn more from
vignette("scc")
. Finally, we see that 2 out of the 5
sufficient causes are minimal. For a set of component causes to form a
sufficient cause, it needs to be minimally sufficient, i.e., if you
would remove any of its component causes, the remaining ones would not
be sufficient anymore. In our example, you will see shortly that
component causes “rain” and “get groceries” are sufficient to “get wet”.
If it would not be raining or if you would not go outside to get
groceries, you would not get wet. Therefore, it is minimally sufficient.
If you add another component cause, such as “weekday”, to the set, it
would still lead to the outcome; however, it would not be minimally
sufficient anymore. Now, let’s inspect our SCC model using
print()
or summary()
, which are identical for
SCC models.
print(scc_model)
#>
#> ── Outcome Definitions ──
#>
#> • you get wet
#>
#> ── SC 1 ──
#>
#> ✔ Always sufficient
#> Component causes:
#> • rain
#> • get groceries
#>
#> Modules
#> • activity: 50% (2/4)
#> • weather: 50% (2/4)
#>
#> ── SC 2 ──
#>
#> ✔ Always sufficient
#> Component causes:
#> • no vacation
#> • weekday
#> • rain
#>
#> Modules
#> • activity: 50% (3/6)
#> • weather: 33% (2/6)
#> • fate: 17% (1/6)
#>
In the output, we are first reminded of our outcome definition before
the sufficient causes (SC) are listed. As we saw before, there are two
of them: SC 1 & SC 2. Both sufficient causes are reported to be
“Always sufficient”. This refers to their sufficiency
status. Due to the structure of steplists, it is possible that
sets of component causes are sufficient, i.e., lead to the outcome, only
under some specific circumstances. To learn more, see
vignette("scc")
. This is, however, not the case here. Based
on our steplist, both sets of component causes, i.e. “rain” & “get
groceries”, as well as “no vacation” & “weekday” & “rain” are
always sufficient to cause outcome “you get wet”. Modules also make an
appearance here. Again, see vignette("modules")
to learn
more.
Learning about sufficient causes is interesting, but we can do a lot more with our SCC model.
First, of course, we want to see some causal pies, for which we can
simple use plot()
. You’ll notice that we additionally
specified unknown = FALSE
. This argument controls if
unknown causes are included in the plot. To learn more about unknown
causes, see vignette("scc")
.
In epidemiology, we usually calculate risk ratios or odds ratios to
estimate the strength of the effect of the exposure on the outcome. We
learn from SCC models, however, that these effect sizes are no natural
constants but that their value depends on the population under study.
epicmodel
offers a function to derive a
standardized effect estimate from the SCC model. Let’s
look at the output to explore how it works.
effect_size(scc_model)
#> ✔ 4/4 | Check impact of every component cause
#> Component Cause Impact
#> 1 rain necessary [5/8 vs. 0/8]
#> 2 get groceries 4.00 [4/8 vs. 1/8]
#> 3 IFNOT take vacation THEN no vacation 1.50 [3/8 vs. 2/8]
#> 4 weekday 1.50 [3/8 vs. 2/8]
From the output, we see that epicmodel
calculates a
value for each component cause. To do this, it lists all possible sets
of component causes (there are 16 of them as we already calculated),
splits them into the half where the component cause of interest is
present and the half where the component cause of interest is absent,
and records the sets that would cause the outcome for each half. Let’s
look, e.g., at the square brackets []
for “get groceries”:
The first part records sets with “get groceries” present and 4 of the 8
lead to the outcome. The second part records sets without “get
groceries” and only 1 of the 8 leads to the outcome. The number in front
is simply the ratio of these two fractions. You can see that the
standardized effect size for “get groceries” with 4.00 is higher than
for “no vacation” or “weekday” with 1.50, which makes sense because “get
groceries” only needs “rain” to be sufficient, while “no vacation” and
“weekday” need “rain” as well as each other. “rain” is marked as a
necessary cause because none of the 8 sets of component causes that do
not include “rain” lead to outcome occurrence.
When printing our SCC model, we saw that “no vacation”, “weekday”,
and “rain” together formed a sufficient cause. Maybe you wondered, how
the three of them would cause “you get wet”? We can explore further by
inspecting the mechanisms. We already saw the complete
one when plotting our steplist. Using mechanism()
, we can
split it up by sufficient cause. Using plot()
produces one
graph for every sufficient cause. Using print()
provides us
with the legend we already know from earlier. Graphs can be downloaded
with export_mechanism()
.
#> Label Module Step
#> CC1 activity no vacation
#> CC2 fate weekday
#> CC3 weather rain
#> CC4 activity get groceries
#> I1 activity take vacation
#> I2 activity take umbrella
#> I3 activity work from home
#> S1 activity walk to work
#> S2 activity go outside
#> E1 weather you get wet
Let’s take another look at our steplist summary:
summary(steplist_checked)
#>
#> ── Outcome Definitions ──
#>
#> • you get wet
#>
#> ── Component causes ──
#>
#> • no vacation
#> • weekday
#> • rain
#> • get groceries
#>
#> ── Interventions ──
#>
#> • take vacation
#> • take umbrella
#> • work from home
#>
#> ── End steps ──
#>
#> • you get wet
#>
#> ── Other steps ──
#>
#> • walk to work
#> • go outside
You can see that three steps are listed as
interventions, “take vacation”, “take umbrella”, and
“work from home”. For now, we don’t need to know, how interventions are
defined exactly (again, see vignette("steplist")
for the
details), but from their names we can see that they are actions which
might prevent the outcome. epicmodel
can investigate their
impact based on your SCC model. In general, there are two directions
such an investigation can take:
Let’s investigate first, which sufficient causes can be prevented by
intervention “work from home”. We specify "all"
for
argument causes
and "THENd4e1"
, i.e., the step
ID for intervention “work from home”, for argument
intervention
.
intervene(scc_model, causes = "all", intervention = "THENd4e1")
#>
#> ── Intervention ────────────────────────────────────────────────────────────────
#>
#> ── Cause Set 1 ──
#>
#> • rain
#> • get groceries
#> Status without intervention
#> ✔ Always sufficient
#> Status with intervention
#> ✖ No considered intervention is able to prevent the outcome
#>
#> ── Cause Set 2 ──
#>
#> • no vacation
#> • weekday
#> • rain
#> Status without intervention
#> ✔ Always sufficient
#> Status with intervention
#> ✔ Complete prevention by the following minimal intervention sets
#>
#> ── Intervention Set 1
#> • work from home
In the output, we get a comparison of the status without and with intervention for each sufficient cause. For Cause Set 1, it is reported that “No considered intervention is able to prevent the outcome”, but for Cause Set 2, there is complete prevention by intervention set “work from home”.
Now, let’s imagine you are in the following situation: it’s raining,
it’s a weekday, you have vacation, but want to get some groceries, i.e.,
c("THENa1","THENa5","THENd2a3")
in step IDs. Since you are
open to all suggestions, let’s specify "all"
for argument
intervention
.
intervene(scc_model, causes = c("THENa1","THENa5","THENd2a3"), intervention = "all")
#>
#> ── Intervention ────────────────────────────────────────────────────────────────
#>
#> ── Cause Set 1 ──
#>
#> • weekday
#> • rain
#> • get groceries
#> Status without intervention
#> ✔ Always sufficient
#> Status with intervention
#> ✔ Complete prevention by the following minimal intervention sets
#>
#> ── Intervention Set 1
#> • take umbrella
Surprise, you can completely prevent getting wet by taking an
umbrella! Please note that epicmodel
reports only minimal
intervention sets. You could, e.g., additionally work from home (even
though it doesn’t really make sense in this example) and still prevent
getting wet. However, you only need to take the umbrella. For a better
example, consider the following situation: it’s raining, it’s a weekday,
you don’t have vacation, and you don’t have an umbrella. We can look at
this scenario with:
intervene(scc_model, causes = c("IFNOTd6a6THENd5a6","THENa5","THENa1"), intervention = c("THENd6a6","THENd4e1"))
#>
#> ── Intervention ────────────────────────────────────────────────────────────────
#>
#> ── Cause Set 1 ──
#>
#> • no vacation
#> • weekday
#> • rain
#> Status without intervention
#> ✔ Always sufficient
#> Status with intervention
#> ✔ Complete prevention by the following minimal intervention sets
#>
#> ── Intervention Set 1
#> • work from home
#>
#> ── Intervention Set 2
#> • take vacation
The output shows that, to prevent the outcome, you can either work from home or take vacation. Both together would work as well, of course, but again, this intervention set is not minimal.
In addition to interventions, we can focus only on the component
causes. It might be of interest, which component causes need to be
prevented in order to avoid outcome occurrence. prevent()
answers this question for a certain set of component causes,
representing a certain group of individuals.
prevent(scc_model, causes = c("IFNOTd6a6THENd5a6","THENa5","THENa1","THENd2a3"))
#> ✔ 11/11 | Check if prevention set is minimal
#> ℹ 3/11 prevention sets are minimal
#>
#> ── Prevention ──────────────────────────────────────────────────────────────────
#>
#> ── Cause set ──
#>
#> Individuals with exactly the following component causes:
#> • no vacation
#> • weekday
#> • rain
#> • get groceries
#>
#> ── Prevention sets ──
#>
#> Prevent the outcome by preventing any of the following sets:
#>
#> ── Set 1
#> • rain
#>
#> ── Set 2
#> • weekday
#> • get groceries
#>
#> ── Set 3
#> • no vacation
#> • get groceries
#>
prevent()
tells us that there are three possibilities to
prevent the outcome. First, prevent “rain”. Second, prevent that it’s a
“weekday” and that they “get groceries”. Third, prevent that they have
“no vacation” and that they “get groceries”. Of course, additionally
preventing, e.g., “rain” would also work but prevent()
only
reports the smallest prevention sets.
As we saw earlier, “taking vacation” is implemented as an
intervention. But it would also be the best way to prevent component
cause “no vacation”. Therefore, the results of prevent()
and intervene()
can overlap depending on the way these
factors are implemented in the model.
Finally, epicmodel
can transform SCC models to directed
acyclic graphs (DAGs) following VanderWeele & Robins (2007).
dag <- scc_to_dag(scc_model)
dag$dag
#> dag {
#> CC1 [pos="1.000,0.250"]
#> CC2 [pos="1.000,0.750"]
#> CC3 [pos="1.000,1.250"]
#> CC4 [pos="1.000,1.750"]
#> O [outcome,pos="4.000,1.000"]
#> SC1 [pos="3.000,0.000"]
#> SC2 [pos="3.000,1.000"]
#> USC [pos="3.000,2.000"]
#> U_SC1 [pos="2.000,0.000"]
#> U_SC2 [pos="2.000,1.000"]
#> U_USC [pos="2.000,2.000"]
#> CC1 -> SC1
#> CC1 -> SC2
#> CC2 -> SC1
#> CC3 -> SC2
#> CC4 -> SC2
#> SC1 -> O
#> SC2 -> O
#> USC -> O
#> U_SC1 -> SC1
#> U_SC2 -> SC2
#> U_USC -> USC
#> }
Element dag
of the output contains a
dagitty
object thanks to the dagitty
package.
Printing it prints its dagitty
model code, which, e.g., can
be pasted to DAGitty online on https://www.dagitty.net for further processing. You can
also plot dagitty
objects with epicmodel
’s
plot_dag()
function, which tries to mimic DAGitty’s online
layout in a ggplot
object.