Cointegration vs Spurious Correlation: Perceive the Distinction for Correct Evaluation | by Egor Howell | Jul, 2023

Why correlation doesn’t equal causation for time collection

Photograph by Wance Paleri on Unsplash

In time collection evaluation, it’s beneficial to grasp if one collection influences one other. For instance, it’s helpful for commodity merchants to know if a rise in commodity A results in a rise in commodity B. Initially, this relationship was measured utilizing linear regression, nevertheless, within the Nineteen Eighties Clive Granger and Paul Newbold confirmed this method yields incorrect outcomes, significantly for non-stationary time collection. In consequence, they conceived the idea of cointegration, which gained Granger a Nobel prize. On this publish, I need to focus on the necessity and utility of cointegration and why it is a vital idea Knowledge Scientists ought to perceive.


Earlier than we focus on cointegration, let’s focus on the necessity for it. Traditionally, statisticians and economists used linear regression to find out the connection between totally different time collection. Nevertheless, Granger and Newbold confirmed that this method is wrong and results in one thing known as spurious correlation.

A spurious correlation is the place two time collection could look correlated however really they lack a causal relationship. It’s the traditional ‘correlation doesn’t imply causation’ assertion. It’s harmful as even statistical checks could properly say that there’s a casual relationship.


An instance of a spurious relationship is proven within the plots beneath:

Plot generated by creator in Python.

Right here now we have two time collection A(t) and B(t) plotted as a operate of time (left) and plotted towards one another (proper). Discover from the plot on the best, that there’s some correlation between the collection as proven by the regression line. Nevertheless, by wanting on the left plot, we see this correlation is spurious as a result of B(t) persistently will increase whereas A(t) fluctuates erratically. Moreover, the typical distance between the 2 time collection can be growing…

Configure cross-account entry of Amazon Redshift clusters in Amazon SageMaker Studio utilizing VPC peering

Combined Results Machine Studying for Longitudinal & Panel Knowledge with GPBoost (Half III) | by Fabio Sigrist