Introduction
In this blog we will explore how to set up and interpret cointegration results using a realworld time series example. We will cover the case with no structural breaks as well as the case with one unknown structural break using tools from the GAUSS tspdlib library.
Dataset
In this blog, we will use the famous NelsonPlosser time series data. The dataset contains macroeconomic fundamentals for the United States.
We will be using three of these fundamentals:
 M2 money stock.
 Bond yield (measured by the basic yields of 30year corporate bonds).
 S&P 500 index stock prices.
The time series data is annual data, covering 1900  1970.
Preparing for Cointegration
In order to prepare for cointegration testing, we will take some preliminary time series modeling steps. We will:
Establishing an Underlying Theory
In this example, we will examine the macroeconomic question of whether stock prices are linked to macroeconomic indicators. In particular, we will examine if there is a cointegrated, longrun relationship between the S&P 500 price index and monetary policy indicators of the M2 money stock and the bond yields.
Mathematically we will consider the cointegrated relationship:
$$y_{sp, t} = c + \beta_1 y_{money, t} + \beta_2y_{bond, t} + u_t$$
Time Series Visualization
When visualizing time series data, we look for visual evidence of:
 The comovements between our variables.
 The presence of deterministic components such as constants and time trends.
 Potential structural breaks.
Our time series plots give us some important considerations for our testing, providing visual evidence to support:
 Comovements between the variables.
 At least one structural break in the time series dynamics of all three of our variables.
 A potential time trend in the datasets, especially in the later years of the sample.
Unit Root Testing
Prior to testing for cointegration between our time series data, we should check for unit roots in the data. We will do this using the adf
procedure in the tspdlib
library to conduct the Augmented DickeyFuller unit root test.
Variable  Test Statistic  1% Critical Value  5% Critical Value  10% Critical Value  Conclusion 

Money  1.621  4.04  3.45  3.15  Cannot reject the null 
Bond yield  1.360  4.04  3.45  3.15  Cannot reject the null 
S&P 500  0.3842  4.04  3.45  3.15  Cannot reject the null 
Our ADF test statistics are greater than the 10% critical value for all of our time series. This implies that we cannot reject the null hypothesis of a unit root for any of our time series data.
Unit Root Testing with Structural Breaks
What about the potential structural break that we see in our time series data? Does this have an impact on our unit root testing?
Using the adf_1break
procedure in the tspdlib
library to test for unit roots with a single structural break in the trend and constant we get the following results.
Variable  Test Statistic  Break Date  1% Critical Value  5% Critical Value  10% Critical Value  Conclusion 

Money  4.844  1948  5.57  5.08  4.82  Cannot reject the null 
Bond yield  3.226  1963  5.57  5.08  4.82  Cannot reject the null 
S&P 500  4.639  1945  5.57  5.08  4.82  Cannot reject the null 
Our ADF test statistics again suggest that even when accounting for the structural break, we cannot reject the null hypothesis of a unit root for any of our time series data.
Conducting our Cointegration Tests
Having concluded that there is evidence for unit roots in our data, we can now run our cointegration tests.
When setting up cointegration tests, there are a number of assumptions that we must specify:
 Which normalization we want to use.
 The deterministic components to include in our model.
 The maximum number of lags to allow in our test.
 The information criterion to use to select the optimal number of lags.
To better understand these general assumptions, let’s look at the simplest of our tests, the EngleGranger cointegration test.
Normalization
In the twostage, residualbased cointegration tests which we will consider today, normalization amounts to deciding which variable is our dependent variable and which variables are our independent variables in the cointegration regression.
We will choose our normalization to reflect our theoretical question of whether the S&P 500 index is cointegrated with the money stock and the bond yield. As we mentioned earlier, this means we will consider the cointegrated relationship:
$$y_{sp, t} = c + \beta_1 y_{money, t} + \beta_2 y_{bond, t} + u_t$$
// Set fname to name of dataset
fname = "nelsonplosser.dta";
// Load three variables from the dataset
// and remove rows with missing values
coint_data = packr(loadd(fname, "sp500 + m + bnd"));
// Define y and x matrix
y = coint_data[., 1];
x = coint_data[., 2 3];
The Deterministic Component
The second assumption we must make about our EngleGranger test is which model
we wish to use. To understand how to make this decision, let's look closer at what this input means.
The EngleGranger test is a twostep test:
 Estimate the cointegration regression.
 Test for stationary in the residuals using the ADF unit root test.
When we specify which model to use we impact two things:
 The deterministic components which are used in the firststage cointegration regression.
 The distribution of the test statistic.
There are three options to choose from:

No constant or trend (
model = 0
) $$y_{sp, t} = \beta_1 y_{money, t} + \beta_2 y_{bond, t} + u_t$$ 
Constant (
model = 1
) $$y_{sp, t} = \alpha + \beta_1 y_{money, t} + \beta_2 y_{bond, t} + u_t$$  Constant and trend (
model = 2
) $$y_{sp, t} = \alpha + \delta t + \beta_1 y_{money, t} + \beta_2 y_{bond, t} + u_t$$
For our example, we will include a constant and trend in our firststage cointegration regression by setting:
// Select model with constant and trend
model = 2;
The Lag Specifications
In the secondstage ADF residual unit root test, the error terms should be serially independent. To account for possible autocorrelation, lags of the first differences of the residual can be included in ADF test regression.
The GAUSS coint_egranger
will automatically determine the optimal number of lags to include in the secondstage regression based on two user inputs:
 The maximum number of lags to allow.
 The criterion to use to determine the optimal number of lags:
 The Akaike information criterion (AIC) [
ic = 0
]  The Schwarz information criterion (SIC) [
ic = 1
]  The tstat criterion [
ic = 2
]
 The Akaike information criterion (AIC) [
/*
** Information Criterion:
** 1=Akaike;
** 2=Schwarz;
** 3=tstat sign.
*/
ic = 2;
// Maximum number of lags
pmax = 12;
Calling our Cointegration Test
Now that we have loaded our data and chosen the test settings, we can call the coint_egranger
procedure:
// Perform EngleGranger Cointegration Test
{ tau_eg, cvADF_eg } = coint_egranger(y, x, model, pmax, ic);
Interpreting Our Cointegration Results
In order to interpret our cointegration results, let's revisit the two steps of the EngleGranger test:
 Estimate the cointegration regression.
 Test the residuals from the cointegration regression for unit roots.
The EngleGranger test statistic for cointegration reduces to an ADF unit root test of the residuals of the cointegration regression:
 If the residuals contain a unit root, then there is no cointegration.
 The null hypothesis of the ADF test is that the residuals have a unit root. Therefore, the EngleGranger test considers the null hypothesis that there is no cointegration.
 As the EngleGranger test statistic decreases:
 We are more likely to reject the null hypothesis of no cointegration.
 We have stronger evidence that the variables are cointegrated.
After running our cointegration test we obtain the following results:
EngleGranger Test Constant and Trend H0: no cointegration (EG, 1987 & P0, 1990) Test Statistic CV(1%, 5%, 10%)   EG_ADF 2.105 4.645 4.157 3.843
We can see that:
 Our test statistic of 2.105 is larger than the critical values at the 1%, 5%, and 10% levels.
 We cannot reject the null hypothesis of no cointegration.
 We do not find evidence in support of the cointegration of the S&P 500 with the U.S. money stock and bond yield.
Conducting our Cointegration Tests with One Structural Break
Earlier we saw that the potential structural break in our data did not change our unit root test conclusion. We should also see if the structural break has an impact on our cointegration testing.
To do this we will use the GregoryHansen cointegration test which can be implemented using the coint_ghansen
test in the tspdlib
library.
We can carry over all of our coint_egranger
testing specifications, except our model specification.
The Model Specification
When implementing the GregoryHansen test, we must decide on a model which specifies:
 Which deterministic components are present in the cointegration regression.
 How the structural break affects the cointegration regression.
There are four modeling options to choose from
 The level shift [
model = 1
]
$$y_{sp, t} = \mu_1(1  d_{\tau}) + \mu_{1,\tau} d_{\tau} + \beta_1 y_{money, t} + \beta_2 y_{bond, t} + u_t$$
In this model, there is a structural break at time $\tau$ and $d_{\tau}$ is an indicator variable equal to 1 when $t >= \tau$. The constant before the structural break is $\mu_1$ and the constant after the structural break is $\mu_2$.  The level shift with trend [
model = 2
]
$$y_{sp, t} = \mu_1(1  d_{\tau}) + \mu_{1,\tau} d_{\tau} + \delta t + \beta_1 y_{money, t} + \beta_2 y_{bond, t} + u_t$$
In this model, the structural break again affects the constant. However, there is also a time trend included in the model.  The regime shift [
model = 3
]
$$y_{sp, t} = \mu_1(1  d_{\tau}) + \mu_{1,\tau} d_{\tau} + \beta_1(1  d_{\tau})y_{money, t} +$$ $$\beta_{1,\tau}d_{\tau}y_{money, t} + \beta_2(1  d_{\tau}) y_{bond, t} + \beta_{2,\tau}d_{\tau}y_{bond, t} + u_t$$
In this model, the structural break affects the constant and regression coefficients.  The regime and trend shift shift [
model = 4
]
$$y_{sp, t} = \mu_1(1  d_{\tau}) + \mu_{1,\tau} d_{\tau} + \delta_1(1  d_{\tau}) t + \delta_{1,\tau}d_{\tau}t + \beta_1(1  d_{\tau})y_{money, t} +$$ $$\beta_{1,\tau}d_{\tau}y_{money, t} + \beta_2(1  d_{\tau}) y_{bond, t} + \beta_{2,\tau}d_{\tau}y_{bond, t} + u_t$$
In this model, the structural break again affects the constant, the regression coefficients, and the trend.
For example, let's consider the last case, where the constant, coefficients, and trend are all impacted by the structural break:
// Set fname to name of dataset
fname = "nelsonplosser.dta";
// Load three variables from the dataset
// and remove rows with missing values
coint_data = packr(loadd(fname, "sp500 + m + bnd"));
// Define y and x matrix
y = coint_data[., 1];
x = coint_data[., 2 3];
// Regime and trend shift
model = 4;
/*
** Information Criterion:
** 1=Akaike;
** 2=Schwarz;
** 3=tstat sign.
*/
ic = 2;
// Maximum number of lags
pmax = 12;
/*
** Long run variance computation
** 1 = iid
** 2 = Bartlett
** 3 = Quadratic Spectral (QS);
** 4 = SPC with Bartlett /see (Sul, Phillips & Choi, 2005)
** 5 = SPC with QS;
** 6 = Kurozumi with Bartlett
** 7 = Kurozumi with QS
*/
varm = 1;
// Bandwidth for variance
bwl=1;
// Data trimming
trimm=0.1;
// Perform cointegration test
{ ADF_min_gh, TBadf_gh, Zt_min_gh, TBzt_gh, Za_min_gh, TBza_gh, cvADFZt_gh, cvZa_gh } =
coint_ghansen(y, x, model, bwl, ic, pmax, varm, trimm);
Interpreting Our Cointegration Results with One Structural Break
The coint_ghansen
procedure provides more extensive results than the coint_egranger
test. In particular, the GregoryHansen test:
 Performs Augmented DickeyFuller testing on the residuals from the cointegration regression.
 Perform the PhillipsPerron testing on the residuals from the cointegration regression.
 Identifies structural breaks.
Cointegration results with one structural break
Cointegration test results
After calling the coint_ghansen
procedure and testing all possible models, we obtain the following test statistic results:
Test  $ADF$ Test Statistic  $Z_t$ Test Statistic  $Z_{\alpha}$ Test Statistic  10% Critical Value $ADF$,$Z_t$  10% Critical Value $Z_{\alpha}$  Conclusion 

GregoryHansen, Level shift  4.004  3.819  27.858  4.690  42.490  Cannot reject the null of no cointegration for $ADF$, $Z_t$, or $Z_{\alpha}$. 
GregoryHansen, Level shift with trend  3.889  3.751  27.618  5.030  48.94  Cannot reject the null of no cointegration for $ADF$, $Z_t$, or $Z_{\alpha}$. 
GregoryHansen, Regime change  4.658  4.539  32.766  5.23  52.85  Cannot reject the null of no cointegration for $ADF$, $Z_t$, or $Z_{\alpha}$. 
GregoryHansen, Regime change with trend  5.834  4.484  32.411  5.72  63.10  Cannot reject the null of no cointegration for $ADF$, $Z_t$, or $Z_{\alpha}$. 
As we can see from these results, there is no evidence that our S&P 500 Index is cointegrated with the money stock and bond yield.
Structural break results
The coint_ghansen
procedure also returns estimates for break dates based on the $ADF$, $Z_t$, and $Z_{\alpha}$ tests:
Test  $ADF$ Break Date  $Z_t$ Break Date  $Z_{\alpha}$ Break Date 

GregoryHansen, Level shift  1958  1956  1956 
GregoryHansen, Level shift with trend  1958  1956  1956 
GregoryHansen, Regime change  1955  1955  1955 
GregoryHansen, Regime change with trend  1951  1953  1947 
What can we Conclude from the GregoryHansen Cointegration Test?
The results from our Gregory Hansen cointegration test provide some important conclusions:
 There is no support for cointegration.
 Incorporating a structural break does NOT change our conclusion that there is no cointegration.
Note that while the GregoryHansen test does estimate break dates, it does not provide the statistical evidence to conclude whether these are statistically significant break dates or not.
Conclusion
Today's blog looks closer at the EngleGranger and GregoryHansen residualbased cointegration tests. By building a better understanding of how the tests work and what assumptions we make when running the tests, you will be better equipped to interpret the test results.
In particular, today we learned
 How to prepare for cointegration testing.
 How to set up the specifications for cointegration tests.
 How to interpret the results from the EngleGranger and GregoryHansen cointegration tests.
Eric has been working to build, distribute, and strengthen the GAUSS universe since 2012. He is an economist skilled in data analysis and software development. He has earned a B.A. and MSc in economics and engineering and has over 15 years of combined industry and academic experience in data analysis and research.
Nice post, very pedagogical, these three parameters need to be specified:
// To be specified
bwl=1;
trimm=0.1;
varm=1;
Best,
JS
Hello Jamel,
Thank you for your comment! I've updated the blog to reflect this.
Also, it should be noted that since the last update of TSPDLIB, the
bwl
,ic
,pmax
,varm
, andtrimm
arguments are all optional arguments. This allows you to callcoint_ghansen
using internal defaults for these parameters:
More information about the default values can be in the TSPDLIB documentation.
Best,
Erica