Structural Causal Modeling
In this second post, I want to describe a method developed by Judea Pearl called Structural Causal Modeling (SCM). Based on his previous work on Bayesian Networks and machine learning, since the 1980s Professor Pearl developed the SCM framework using elements of graph theory to elaborate causal models. Moreover, this method integrates the Potential Outcome Model (POM) and Structural Equation Modeling (SEM) into a single framework.
SCM was developed using graph theory thus its visual representation is a graph where causal hypotheses are presented as directed arrows between observed or latent variables. Because the possibilities are exponentially unwieldy when we elaborate on directed graphs, this method has to deal with different structures of intermediates variables. The role of mediators, confounders, and colliders on causal paths need to be treated carefully because they might affect or even eliminate the causal effect under analysis. Finally, it is important to point out that SCM requires that researchers elaborate their models based on theoretical grounds, each directed link that establishes a causal relationship between variables must be justified under logical assumptions.
Directed Acyclic Graphs
Before I move forward with the explanation of SCM, I would like to take a quick look at the concept of directed acyclic graphs (DAG). In DGAs the variables are referred to as nodes or vertices. These variables are connected by edges or links that establish dependences between them. We say that a pair of nodes is related (adjacent) if they are connected by a link. The direction of the link represented by an arrow () determines what node is the cause and what is the effect, the direct causes of a variable are called parents while all variables caused by another one are its children. The absence of an arrow between a pair of variables reflect the hypothesis that there is not causal effect between them. DAGs do not have paths that start and end in the same node (cycles). In figure 1 you can see an example of a DAG.
The Structural Causal Model developed by Pearl combines elements of the structural equation models, the potential outcome framework, and graphical models developed for probabilistic reasoning (Bayesian networks) and causal analysis (Pearl, 2009).The framework addresses fundamental challenges in causal inference due to the following list of features (Kline, 2015):
- Causal hypotheses are represented both graphically and in expressions that are a kind of mathematical language subject to theorems, lemmas, and proofs.
- The SCM provides a precise language for communicating the assumptions behind causal questions to be answered.
- The SCM explicitly distinguishes between questions that can be empirically tested versus those that are unanswerable, given the model. It also provides ways to determine what new measurements would be needed to address an “unanswerable” question.
- Finally, the SCM subsumes other useful theories or methods for causal inference, including the potential outcomes model and SEM
As SCM represents a causal network graph, there are three basic building blocks that characterize all possible pattern of arrows in the network:
- Chain: where represents a mediator.
- Fork: where represents a common cause or confounder.
These blocks have implications for the covariate selection in regression analysis. Given a causal model, it is appropriate to control for the confounders to avoid confounder bias, inadvertently controlling for a mediator might eliminate some or all the causal effect in the chain; and controlling for a collider can lead to collider bias, which induces spurious association.
To clarify the elements I have introduced so far I want illustrate an example. You might remember the example in the Granger Causality post trying to predict if the emotional expressions of Xavier help to predict the emotions of Yvonne. Let’s make some changes to the same example to use it with a SCM. Now we are interested in the causal effect of the emotions expressed by Xavier on the emotions posted by Yvonne later, represented in figure 2. You might think that is not possible to make that argument because there could be other variables that affect Xavier and Yvonne emotions such as the weather but that is why this diagram includes the variable . This new variable represents all the Unobserved factors that generates confounding between the emotional expressions of Xavier and Yvonne. Assuming that theoretically this model is correct (which is a powerful claim) the representation establishes a causal mechanism which is different from the prediction I presented in the example of the Granger causality post. It is important to notice that the causal claim of the model should be based on theoretical grounds.
According to the model proposed in figure 2, there are three variables: which represents Xavier’s emotional expression, showing the possible unobserved confounders such as the weather, the trending topics on the social media platform, etc., and the emotional expression of Yvonne . Also we can observe three causal relationships , , and . This model tells us that Xavier’s emotions and unobserved variables ’cause’ the emotional expression of Yvonne but also the unobserved variables have a causal effect on the emotions of Xavier. The structure is a confounder between the variables Xavier and Yvonne so if we want to obtain a correct causal effect of Xavier on Yvonne it is needed to control for the unobserved variables. This example also helps to visualize that if we are interested in the causal effect of on Yvonne, the variable Xavier acts as a mediator between them.
Causal diagrams present the problem that noncausal paths can generate confounding therefore causal claims would be wrong in that scenario. To correct this problem and decounfound two given variables of interest such as and , it is necessary to block every noncausal path between them without blocking or perturbing any causal paths. Because the previous task can be complicated there are three main techniques to find the adequate set of controls to make the proper adjustments and calculate causal effects (Shalizi, forthcoming).
1. Back-door criterion
In a SCM if we can condition on an intelligently-chosen set of covariates , which block all the indirect paths from to but leave all the direct paths open it is possible to compute the causal effect from to . To see whether a candidate set of controls is adequate, we apply the back-door criterion. A back-door path is a path between and that starts with an arrow into , they are also called indirect paths. This kind of paths generate confounding effect by creating a non-causal channel along which information flows. A set of control variables satisfies the back-door criterion if (1) blocks all back-door paths between and , and (2) no node in is caused by .
The example in figure 3 shows four different back-door paths between and . (1), (2) , (3), and (4).
2. Front-door criterion
For this criterion we can find a set of variables which mediate all causal influence of on , which means that all of the direct paths from to pass through . If we can identify the effect of on , and of on , then we can combine these to get the effect of on . The test for whether we can do this combination is the front-door criterion. We say that a set of variables satisfies the front-door criterion if (1) blocks all direct paths from to , (2) there are no unblocked back-door paths from to , and (3) blocks all back-door paths from to . Figure 4 presents a SCM which all the effect of on is mediated by the effect of on . With this configuration we can obtain the effect of on the back-door is blocked by the collider , and the effect of on because we can block the back door controling by with these results finally we can compute the effect of on
3. Instrumental Variables
This last technique will be analyzed in more detail in the next post but here is gist of it. The idea is to find a variable which affects , and which only affects by influencing . If we can identify the effect of on , and of on , then we can “factor” them to get the effect of on . (That is, I gives us variation in which is independent of the common causes of and .) is then an instrumental variable for the effect of on .
In figure 5 we can see that the instrument allow us to obtain the effect of in directly, then, we also are able to compute the effect of on through because the path is blocked by the collider . With these results it should be possible to “factor” the effect of on
The main assumption of an SCM is that models must be created based on theoretical grounds. By combining models, structural equations, and observational data, researchers should be able to draw causal conclusions as long as they defend the logic of their assumptions. However, causal claims seem to be avoided by researchers since structural equations in SEM became an issue of the model adjustment than the theoretical implications behind it. Data can give researchers estimates but it cannot tell the reason why for those measures.
A Back-door Example
Here is an example of the use of back-door criterion with simulated data for the SCM defined in figure 3.
The process has the following steps:
- Plot of SCM
- Review all the possible paths between the variables of interest.
- Definition of set of variables we need to control for.
- Data simulation
- Compute the causal effects.
Kline, R. B. (2015). Principles and Practice of Structural Equation Modeling, Fourth Edition. Guilford Publications.
Pearl, J. (2009). Causal inference in statistics: An overview. Statistics Surveys, 3, 96–146.
Shalizi, C. R. (n.d.). Advanced Data Analysis from an Elementary Point of View.