Granger Causality

This is the first of a series of posts where I would like to introduce methods that can be useful when we are interested in research questions about cause and effect in social sciences. The method I picked, to begin with, is called Granger Causality, which might be the best-known (so-called) ‘causal method’ in the social sciences, last but not least because it is Nobel-Prize crowned.

Causality and Granger

In the 1960s, Professor Clive Granger was interested in pairs of interrelated stochastic processes. Moreover, he wanted to know if the relationship between processes could be broken down into a pair of one-way relationships. By reviewing the idea of Norbert Wiener (1956), who proposed that prediction theory could be used to define causality between time series, Granger started a discussion of a practical adaptation for this concept, which was finally coined as Granger causality. As we shall see, this term is a misnomer, as “Granger predictability” might be a better term; even Granger himself acknowledged during his Nobel lecture (2003) that he didn’t know that many people had fixed ideas about causation. However, because of his pragmatic definition, Granger got plenty of citations, but, in his own words, “many ridiculous papers appeared” as well. In Figure 1, we can observe that Granger causality is basically prediction; the information contained in one time series allows the observer to predict, with a time-lag, the behavior of another time series of interest. The time-lag is what provides theoretical justification for directionality in the correlation between both variables. However, note that this does not assure that some other (unobserved) variable might have been the true “cause” of what is observed, and the observed variable is simply a confounder.

Figure 1

Before going into the formal details of the method, I would like to present the logic of Granger causality with an example. Let’s say that we are interested in predicting Yvonne’s emotional expression on social media tomorrow. An initial approach would be to think that Yvonne’s emotions will be based on how she is feeling today, yesterday, and the day before yesterday, so we are looking at the historical information of Yvonne posting online; in other words, at the time series of Yvonne’s emotions. We can also think about another approach. It might be that Yvonne’s emotions are not only based on her own history, but the content she posts online also depends on the online emotional expression of her best friend, Xavier. In this case, the new explanation examines the time series of Yvonne and Xavier’s emotions. If, after comparing these two approaches, we conclude that the latter is better because it gives us a more accurate prediction of Yvonne’s emotional expression, then we conclude that the emotions of Xavier “Granger causes” the emotions of Yvonne. Of course, from this toy example, you can already see all the problems associated with the term “causality,” as nothing assures that neither her own expressions nor Xavier’s emotions “caused” anything associated with Yvonne’s future posts. But semantics aside, let’s have a closer look at what Granger actually showed about the “co-integration” of two time series (which was the actual title of his Nobel Prize). In Figure 2 (top), you can see a representation in which only Yvonnepast affects Yvonnefuture, whereas the bottom part shows the alternative option, including Xavierpast affecting Yvonnefuture.

Figure 2

The mathematical formulation of Granger causality could be represented as follows. Let’s suppose that we have two time series \( X_t \) and \( Y_t \), if we are trying to predict \( Y_{t+1} \), we can do it only using the past terms of \( Y_t \) so its future depends only on its own past, or we can use both time series \( X_t \) and \( Y_t \) to make the prediction. If the later prediction considering a relationship between the two time series is significantly better, then it is possible to argue that the past of \( X_t \) contains valuable information to predict \( Y_{t+1} \); therefore, there is a one-way relationship from \( X_t \) to \( Y_t \). In this case, the conclusion is \( X_t \) “Granger causes” \( Y_t \). This conclusion can be made because the analysis rests on the proposition of two funding axioms made by Granger (1980):

  1. The cause happens before its effect.
  2. The cause has unique information about the future values of its effect.

It is important to point out that the term causality can be a bit misleading. Granger causality shows that the change in the temporal evolution of one variable precedes the change of values on another. Since time proceeds only in one direction (as far as humans can perceive), the time delay establishes directionality or better time-bound predictability. Still, it is mute on the more profound question of the mechanisms of causality. People solve this debate by using the term “Granger causality” to distinguish between this practical definition and causality.


To make an analysis using Granger causality, we need to be familiar with linear autoregressive models, which is a fancy name for a linear method used to predict future behavior based on past information of the same variable. A typical mathematical representation of an autoregressive model with one time-lag would be:

\[ Y(t) = \alpha_1 Y(t-1) + error_1 (t) \tag{1} \] 

Where \( \alpha \) represents the variable’s coefficient with one time-lag and the \( error_1 \) has a distribution with mean \( 0 \) and variance \( \sigma \).

As I mentioned earlier, the application of Granger causality compares two different models, one containing information from only one variable and the other from two. The model in (1) includes information on \( Y \) only, so we need a model with information about \( Y \) and another variable, which I will call \( X \), to proceed with the Granger causality analysis.

\[ Y(t) = \alpha_1 Y(t-1) + \beta_1 X(t-1) + error_2 (t) \tag{2} \]

Where \( \alpha \) represents the coefficient of the variable \( Y \) with one time-lag, \( \beta \) is the coefficient of the variable \( X \) with one time-lag, and the \( error_2 \) has a distribution with mean \( 0 \) and variance \( \sigma \).

Finally, once we have the models described in (1) and (2), a F-test is performed with the null hypothesis of \( Y(t) \) equals to model (1) against the alternative hypothesis of \( Y(t) \) equals to model (2). The final result allows us to evaluate if \( X \) Granger causes \( Y \) or not. We say that \( X \) Granger causes \( Y \) if we reject the null hypothesis.

The previous formulation presents the case when we include only one time-lag of the variables under study in the models. You might be asking what is the formulation for the general case. Using the sum notation, we can extend models (1) and (2) as follows:

 \[ Y(t) = \sum_{j=1}^p \alpha_j Y(t-j) + error_1 (t) \tag{3} \]

\[ Y(t) = \sum_{j=1}^p \alpha_j Y(t-j) + \sum_{j=1}^p \beta_j X(t-j) + error_2 (t) \tag{4} \]

The same as the previous model in (3) and (4) \( \alpha_j \) and \( \beta_j \) are the coefficients of the model, \( error_1 \) and \( error_2 \) have a distribution with mean \( 0 \) and variance \( \sigma \), but there is a new element \( p \). Here \( p \) represents the maximum number of lagged observations included in the model. The selection of an appropriate number of lags \( p \) is a crucial decision to make before performing the analysis; we can base this choice on statistical criteria for information such as FPE, AIC, HQIC, and BIC. These measures compare the inherent trade-off between model complexity and the model’s predictive power, following Occam’s insight that models should only be as complex as necessary. However, theoretical considerations are also important in selecting the final number of lags to be included in the model. Once the decision of the maximum lagged observations is made, we proceed in the same fashion previously described in the case of \( p=1 \) to conclude if \( Y \) Granger causes \( X \).


The study of Granger causality was initially designed for linear equations. There are extensions to nonlinear cases nowadays, but those extensions can be more challenging to understand and use. Moreover, their statistical meaning is less comprehended. Among the extensions, we can find approaches that divide the global nonlinear data into smaller locally linear neighborhoods (Freiwald et al., 1999) or the use of a radial basis function method to perform global nonlinear regression (Ancona et al., 2004)

Another limitation is that one of the assumptions of Granger causality requires that the time series be stationary, which means that their mean and variance should remain constant over time. Non-stationary time series can be transformed into stationary ones using the differentiation method. Additionally, assuming that short windows of non-stationary data produce locally stationary signals, it is possible to perform the analysis in such windows (Hesse et al., 2003).

A Granger Causality Application

As a student in the field of Communication, one of my interests is the study of the information dynamics that people create on social media. In this section, I want to share an application of Granger causality I have worked on.

For this example, the objective is to analyze if there is a relationship between online communicated emotions in the aftermath of a natural disaster. The approach seeks to understand if the emotions expressed after a tragic event follow a pattern in which it is possible to predict the appearance of some emotions based on the previous expression of other emotions. For example, after an earthquake, people express fear initially, but once time passes, the expression of emotion changes to sadness, so we would like to know if the initial information about fear helps to predict the later appearance of sadness. In this case, we collected data from Twitter in the aftermath of the earthquake in Southern California on July 5, 2019. To assess the emotion, each tweet was processed using the Natural Language Understanding tool provided by IBM Watson. This tool analyzes the text to assign a value from 0 to 1 for the presence of five emotions: sadness, anger, fear, disgust, and joy.

The procedure has the following steps:

  1. Check stationarity: We upload the data and check the stationarity of the time series.
  2. Stationarity correction: If required, we transform the time series to make them stationary.
  3. Lag selection: We selected the maximum lag for the analysis based on statistical and theoretical criteria.
  4. VAR analysis: We conduct an initial multivariate analysis to identify predictive relationships between the five emotions studied.
  5. Granger causality: Once we identify the predictive relationship between pairs of emotions, we conduct the analysis to determine if those relationships are Granger causal.
You can access the dataset used in the example here.


Ancona, N., Marinazzo, D., & Stramaglia, S. (2004). Radial basis function approach to nonlinear Granger causality of time series. Physical Review E70(5), 056221.

Brady, H. E. (2011). Causation and Explanation in Social Science. Oxford University Press.

Freiwald, W. A., Valdes, P., Bosch, J., Biscay, R., Jimenez, J. C., Rodriguez, L. M., Rodriguez, V., Kreiter, A. K., & Singer, W. (1999). Testing non-linearity and directedness of interactions between neural groups in the macaque inferotemporal cortex. Journal of Neuroscience Methods94(1), 105–119.

Granger, C. W. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica: Journal of the Econometric Society, 424–438.

Hesse, W., Möller, E., Arnold, M., & Schack, B. (2003). The use of time-variant EEG Granger causality for inspecting directed interdependencies of neural assemblies. Journal of Neuroscience Methods124(1), 27–44.

Wiener, N. (1956). The theory of prediction. Modern mathematics for engineers. New York, 165–190.