Structural Causal Modeling

In this second post, I want to describe a method developed by Judea Pearl called Structural Causal Modeling (SCM). Based on his previous work on Bayesian Networks and machine learning, since the 1980s, Professor Pearl developed the SCM framework using elements of graph theory to elaborate causal models. Moreover, this method integrates the Potential Outcome Model (POM) and Structural Equation Modeling (SEM) into a single framework.

SCM was developed using graph theory; thus, its visual representation is a graph where causal hypotheses are presented as directed arrows between observed or latent variables. Because the possibilities are exponentially unwieldy when we elaborate on directed graphs, this method has to deal with different structures of intermediate variables. The role of mediators, confounders, and colliders on causal paths must be treated carefully because they might affect or even eliminate the causal effect under analysis. Finally, it is important that SCM requires researchers to elaborate their models based on theoretical grounds; each directed link that establishes a causal relationship between variables must be justified under logical assumptions.

Directed Acyclic Graphs

Before I move forward with the explanation of SCM, I would like to take a quick look at the concept of directed acyclic graphs (DAG). In DGAs, the variables are referred to as nodes or vertices. These variables are connected by edges or links that establish dependencies between them. We say that a pair of nodes is related (adjacent) if a link connects them. The direction of the link represented by an arrow (\( \longrightarrow \)) determines what node is the cause and what is the effect; the direct causes of a variable are called parents, while all variables caused by another one are its children. The absence of an arrow between a pair of variables reflects the hypothesis that no causal effect exists between them. DAGs do not have paths that start and end in the same node (cycles). In Figure 1, you can see an example of a DAG.

Tred-G

Figure 1

SCM

The Structural Causal Model developed by Pearl combines elements of structural equation models, the potential outcome framework, and graphical models developed for probabilistic reasoning (Bayesian networks) and causal analysis (Pearl, 2009). The framework addresses fundamental challenges in causal inference due to the following list of features (Kline, 2015):

      1. Causal hypotheses are represented both graphically and in expressions that are a mathematical language subject to theorems, lemmas, and proofs.
      2. The SCM provides a precise language for communicating the assumptions behind causal questions to be answered.
      3. The SCM explicitly distinguishes between questions that can be empirically tested versus those that are unanswerable, given the model. It also provides ways to determine what new measurements would be needed to address an “unanswerable” question.
      4. Finally, the SCM subsumes other useful theories or methods for causal inference, including the potential outcomes model and SEM.

    As SCM represents a causal network graph, there are three basic building blocks that characterize all possible patterns of arrows in the network:

      1. Chain: \((X \rightarrow W \rightarrow Y)\) where represents a mediator.
      2. Fork: \((X \leftarrow W \rightarrow Y)\) where \(W\) represents a common cause or confounder.
      3. Collider \((X \rightarrow W \leftarrow Y)\)

      These blocks have implications for the covariate selection in regression analysis. Given a causal model, it is appropriate to control for the confounders to avoid confounder bias; inadvertently controlling for a mediator might eliminate some or all the causal effects in the chain, and controlling for a collider can lead to collider bias, which induces spurious association.

      I want to illustrate an example to clarify the elements I have introduced so far. You might remember the example in the Granger Causality post trying to predict if Xavier’s emotional expressions help predict Yvonne’s emotions. Let’s make some changes to the same example to use it with a SCM. Now, we are interested in the causal effect of the emotions expressed by Xavier on the emotions posted by Yvonne later, represented in Figure 2. You might think that it is not possible to make that argument because other variables could affect Xavier and Yvonne’s emotions, such as the weather, but that is why this diagram includes the variable \(U\). This new variable represents all the Unobserved factors that generate confounding between the emotional expressions of Xavier and Yvonne. Assuming that, theoretically, this model is correct (which is a powerful claim), the representation establishes a causal mechanism that is different from the prediction I presented in the example of the Granger causality post. It is relevant to notice that the causal claim of the model should be based on theoretical grounds.

      Figure 2

      According to the model proposed in Figure 2, there are three variables: \( X \), which represents Xavier’s emotional expression, \( U \), showing the possible unobserved confounders such as the weather, the trending topics on the social media platform, etc., and the emotional expression of Yvonne \( Y \). Also, we can observe three causal relationships \( X \rightarrow Y \), \( U \rightarrow X \), and \( U \rightarrow Y \). This model tells us that Xavier’s emotions and unobserved variables ’cause’ the emotional expression of Yvonne, but also that the unobserved variables have a causal effect on Xavier’s emotions. The structure \( X \leftarrow U \rightarrow Y \) is a confounder between the variables Xavier and Yvonne, so if we want to obtain a correct causal effect of Xavier on Yvonne, it is needed to control for the unobserved variables. This example also helps to visualize that if we are interested in the causal effect of \( U \) on Yvonne, the variable Xavier acts as a mediator between them.

      Analysis

      Causal diagrams present the problem that noncausal paths can generate confounding. Therefore, causal claims would be wrong in that scenario. To correct this problem and decounfound two given variables of interest such as \( X \) and \( Y \), it is necessary to block every noncausal path between them without blocking or perturbing any causal paths. Because the previous task can be complicated, there are three main techniques to find the adequate set of controls to make the proper adjustments and calculate causal effects (Shalizi, forthcoming).

      1. Back-door criterion

      In an SCM, if we can condition on an intelligently chosen set of covariates \( S \), which block all the indirect paths from \( X \) to \( Y \) but leave all the direct paths open it is possible to compute the causal effect from \( X \) to \( Y \). We apply the back-door criterion to see whether a candidate set of controls \( S\) is adequate. A back-door path is a path between \( X \) and \( Y \) that starts with an arrow into \( X \); they are also called indirect paths. These paths generate confounding effects by creating a non-causal channel along which information flows. A set of control variables \( S \) satisfies the back-door criterion if (1) \( S \) blocks all back-door paths between \( X \) and \( Y \), and (2) no node in \( S \) is caused by \( X \).

      The example in Figure 3 shows four different back-door paths between \( X \) and \( Y \). (1) \( X \leftarrow V \rightarrow Y \), (2) \( X \leftarrow V \leftarrow Z \rightarrow W \rightarrow Y \), (3) \( X \leftarrow Z \rightarrow V \rightarrow Y \), and (4) \( X \leftarrow Z \rightarrow W \rightarrow Y \).

      Figure 3

      2. Front-door criterion

      For this criterion, we can find a set of variables \( M \) that mediate all causal influence of \( X \) on \( Y \), which means that all of the direct paths from \( X \) to \( Y \) pass through \( M \). If we can identify the effect of \( M \) on \( Y \) and of \( X \) on \( M \), then we can combine these to get the effect of \( X \) on \( Y \). The test for whether we can do this combination is the front-door criterion. We say that a set of variables \( M \) satisfies the front-door criterion if (1) \( M \) blocks all direct paths from \( X \) to \( Y \), (2) there are no unblocked back-door paths from \( X \) to \( Y \), and (3) \( X \) blocks all back-door paths from \( M \) to \( Y \). Figure 4 presents an SCM in which all the effect of \( X \) on \( Y \) is mediated by the effect of \( X \) on \( M \). With this configuration, we can obtain the effect of \( X \) on \( M \) the back-door is blocked by the collider \( Y \), and the effect of \( M \) on \( Y \) because we can block the back door controlling by \( X \) with these results, finally we can compute the effect of \( X \) on \( Y \)

      Figure 4

      3. Instrumental Variables

      This last technique will be analyzed in more detail in the next post, but here is the gist of it. The idea is to find a variable \( I \) which affects \( X \) and which only affects \( Y \) by influencing \( X \). If we can identify the effect of \( I \) on \( Y \) and of \( I \) on \( X \), then we can “factor” them to get the effect of \( X \) on \( Y \). (That is, I gives us variation in \( X \), which is independent of the common causes of \( X \) and \( Y \).) \( I \) is then an instrumental variable for the effect of \( X \) on \( Y \).

      In Figure 5, we can see that the instrument \( I \) allows us to obtain the effect of \( I \) in \( X \) directly; then, we also can compute the effect of \( I \) on \( Y \) through \( X \) because the path \( I \rightarrow X \leftarrow U \rightarrow Y \) is blocked by the collider \( X \). With these results, it should be possible to “factor” the effect of \( X \) on \( Y \)

      Figure 5

      Limitations

      The main assumption of an SCM is that models must be created based on theoretical grounds. By combining models, structural equations, and observational data, researchers should be able to draw causal conclusions as long as they defend the logic of their assumptions. However, causal claims seem to be avoided by researchers since structural equations in SEM became an issue of the model adjustment than the theoretical implications behind it. Data can give researchers estimates but it cannot tell the reason why for those measures.

      A Back-door Example

      Here is an example of the use of back-door criterion with simulated data for the SCM defined in figure 3.

      The process has the following steps

         

          1. Plot of SCM
          2. Review all the possible paths between the variables of interest.
          3. Definition of set of variables we need to control for.
          4. Data simulation
          5. Compute the causal effects.

        Back-door criterion example

        Step 1

        First we load the packages needed to create the Structural Causal Modeland plot the DAG.

        #We load the package dagitty to create the SCMlibrary(dagitty)#Definition of the DAGg <- dagitty("dag {             Z [pos="1,1"]             V [pos="2,1"]             W [pos="2,0"]             X [pos="3,2"]             Y [pos="4,1"]  Z -> X -> Y  Z -> V -> Y  Z -> W -> Y  V -> X -> Y}")#We load the package ggdag to plot the DAGlibrary(ggdag)#Plot the DAGggdag(g) + theme_dag_blank()

        unnamed-chunk-1-1

        Step 2

        We ask to the software to get all the paths from X to Y

        paths(g, "X", "Y")
        ## $paths## [1] "X -> Y"                "X <- V -> Y"           "X <- V <- Z -> W -> Y"## [4] "X <- Z -> V -> Y"      "X <- Z -> W -> Y"     ## ## $open## [1] TRUE TRUE TRUE TRUE TRUE

        Step 3

        From the previous result we can observe that there are four back-doorpaths from X to Y. Now we need to find the set of controls S which allowus to block all the back-door paths so we can obtain the real causaleffect of X on Y.

        #Set SadjustmentSets(g, "X", "Y", type = "all")
        ## { V, W }## { V, Z }## { V, W, Z }

        Here we got three different sets S_1 = {V,W}, S_2 = {V,Z}, and S_3 ={V,W,Z}. It is important to notice here that we don’t need to controlall the variables.

        Step 4

        We are going to illustrate the previous result by generating simulateddata for the model. Then using that data we are going to fit a model toobtain the estimates for the SCM.

        #We load the package lavaan to simulate the data library(lavaan)#Simulated model (effects of exogenous variables are set to be non-zero)lavaan_model <- "Y ~ .7*X + .5*V + .3*W                 X ~ .5*Z + .6*V                 V ~ .4*Z                 W ~ .2*Z"#Data consistent with the simulated model with 1000 observationsset.seed(1234)g_tbl <- simulateData(lavaan_model, sample.nobs=1000)

        The previous code specifies a traditional SEM, which means that all thefunctions in the SCM are linear. Now that we have the data we are goingto fit a model without the coefficients and then show the parametersestimates.

        #Model without coefficientslavaan_model1 <- "Y ~ X + V + W                  X ~ Z + V                  V ~ Z                  W ~ Z"#Fitting the model with simulated datalavaan_fit <- sem(lavaan_model1, data = g_tbl)#Show model parameter estimatesparameterEstimates(lavaan_fit)
        ##    lhs op rhs   est    se      z pvalue ci.lower ci.upper## 1    Y  ~   X 0.722 0.029 25.278      0    0.666    0.778## 2    Y  ~   V 0.539 0.036 15.010      0    0.469    0.609## 3    Y  ~   W 0.289 0.031  9.435      0    0.229    0.348## 4    X  ~   Z 0.494 0.033 15.034      0    0.430    0.558## 5    X  ~   V 0.568 0.031 18.135      0    0.507    0.630## 6    V  ~   Z 0.376 0.031 12.166      0    0.316    0.437## 7    W  ~   Z 0.234 0.031  7.559      0    0.173    0.295## 8    Y ~~   Y 0.955 0.043 22.361      0    0.871    1.039## 9    X ~~   X 0.963 0.043 22.361      0    0.878    1.047## 10   V ~~   V 0.980 0.044 22.361      0    0.894    1.066## 11   W ~~   W 0.981 0.044 22.361      0    0.895    1.067## 12   Z ~~   Z 1.024 0.000     NA     NA    1.024    1.024

        From the results we can see that the path X to Y has a coefficient of0.7 approx.

        Step 5

        Here we compute the effect of X on Y to show that either using S_1,S_2 or S_3 we obtain the true causal effect.

        #Effect of X on Y with no controlssummary(lm(Y ~ X, data = g_tbl))
        ## ## Call:## lm(formula = Y ~ X, data = g_tbl)## ## Residuals:##     Min      1Q  Median      3Q     Max ## -3.8523 -0.7878  0.0195  0.7524  4.3661 ## ## Coefficients:##             Estimate Std. Error t value Pr(>|t|)    ## (Intercept) -0.04286    0.03555  -1.206    0.228    ## X            1.00665    0.02652  37.956   <2e-16 ***## ---## Signif. codes:  0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1## ## Residual standard error: 1.123 on 998 degrees of freedom## Multiple R-squared:  0.5908, Adjusted R-squared:  0.5904 ## F-statistic:  1441 on 1 and 998 DF,  p-value: < 2.2e-16

        Here we can see that the estimate of 1.0 is wrong because the effectshould be 0.7 so let’s see what happens when we control using S_1, S_2or S_3.

        #Control using S1summary(lm(Y ~ X + V + W, data = g_tbl))
        ## ## Call:## lm(formula = Y ~ X + V + W, data = g_tbl)## ## Residuals:##     Min      1Q  Median      3Q     Max ## -3.1273 -0.6813  0.0219  0.7072  2.9794 ## ## Coefficients:##             Estimate Std. Error t value Pr(>|t|)    ## (Intercept) -0.03559    0.03100  -1.148    0.251    ## X            0.72220    0.02872  25.150   <2e-16 ***## V            0.53890    0.03598  14.980   <2e-16 ***## W            0.28855    0.03080   9.369   <2e-16 ***## ---## Signif. codes:  0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1## ## Residual standard error: 0.9792 on 996 degrees of freedom## Multiple R-squared:  0.6894, Adjusted R-squared:  0.6884 ## F-statistic: 736.8 on 3 and 996 DF,  p-value: < 2.2e-16
        #Control using S2summary(lm(Y ~ X + V + Z, data = g_tbl))
        ## ## Call:## lm(formula = Y ~ X + V + Z, data = g_tbl)## ## Residuals:##     Min      1Q  Median      3Q     Max ## -3.1735 -0.7448  0.0033  0.6882  3.3814 ## ## Coefficients:##             Estimate Std. Error t value Pr(>|t|)    ## (Intercept) -0.03755    0.03232  -1.162    0.246    ## X            0.73993    0.03290  22.489   <2e-16 ***## V            0.54094    0.03759  14.391   <2e-16 ***## Z            0.04016    0.03785   1.061    0.289    ## ---## Signif. codes:  0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1## ## Residual standard error: 1.021 on 996 degrees of freedom## Multiple R-squared:  0.6624, Adjusted R-squared:  0.6614 ## F-statistic: 651.3 on 3 and 996 DF,  p-value: < 2.2e-16
        #Control using S3summary(lm(Y ~ X + V + W + Z, data = g_tbl))
        ## ## Call:## lm(formula = Y ~ X + V + W + Z, data = g_tbl)## ## Residuals:##     Min      1Q  Median      3Q     Max ## -3.1157 -0.6831  0.0223  0.7138  2.9946 ## ## Coefficients:##             Estimate Std. Error t value Pr(>|t|)    ## (Intercept) -0.03561    0.03101  -1.148    0.251    ## X            0.72946    0.03159  23.091   <2e-16 ***## V            0.54022    0.03607  14.978   <2e-16 ***## W            0.29160    0.03130   9.316   <2e-16 ***## Z           -0.02040    0.03690  -0.553    0.581    ## ---## Signif. codes:  0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1## ## Residual standard error: 0.9795 on 995 degrees of freedom## Multiple R-squared:  0.6895, Adjusted R-squared:  0.6882 ## F-statistic: 552.3 on 4 and 995 DF,  p-value: < 2.2e-16

        Results

        Finally we can appreciate that controling for the right variablesdefined in S_1, S_2 or S_3 the effect obtained is closer to the realcausal effect simulated.

        view raw SCMforWeb.md hosted with ❤ by GitHub

        References

        Kline, R. B. (2015). Principles and Practice of Structural Equation Modeling, Fourth Edition. Guilford Publications.

        Pearl, J. (2009). Causal inference in statistics: An overview. Statistics Surveys, 3, 96–146.

        Shalizi, C. R. (n.d.). Advanced Data Analysis from an Elementary Point of View