Monday, 26 December 2011

Modelling returns using PCA : Evidence from Indian equity market

As my finance term paper, I investigated an interesting question where I tried to identify macroeconomic variables that explain the returns on equities. Much of the debate has already taken place on this topic which has given rise to two competing theories of asset pricing viz. CAPM (capital asset pricing theory or single factor model) and APT (arbitrage pricing theory or multi-factor model). Here is a brief discussion on the two in my previous post. In this post I would like to discuss my approach to answering this question in the context of Indian stock market.

Methodology:
  • Companies that have been actively traded on NSE stock exchange for the past 10 years (218 companies) were selected and their daily stock returns data for these 10 years was taken from PROWESS. 
  • Using PCA, first 10 components from the returns data of the 218 companies was extracted. More on PCA in my previous post, here
  • These components were then separately regressed first on NIFTY returns (first regression) 
  • Then these components were regressed on NIFTY returns, MIBOR rate changes, and INR/USD exchange rate changes (second regression).
  • The explanatory power of the 2 regressions were compared using a F-statistic. (refer to pg. 10 in the paper attached in the end of the post)

Findings and R codes:
We start with calculating the PCA of the returns on the 218 companies daily return data, then employing the 2 regressions, then comparing the 2 regressions using a F-statistic. F-stat tells us if there is any additional explanation offered when we include macroeconomic variables (viz. MIBOR, INR/USD) in our equation.



The results that I obtained pose an interesting observation. We find that the F-stat is significant at 5% for 7 out of the 10 regressions, meaning that out of the 10 regressions (each regression with a separate component) we find statistically significant addition in the explanatory power of the model after adding the macroeconomic variables. Therefore, on statistical ground I can argue that a multi factor model (APT) is preferable over a single factor model (CAPM) for modelling stock returns in the case of Indian equity market. This assertion, if holds true, can have reaching implications for asset pricing for Indian securities. Let me explain why. The principal components (that are the dependent variables in the model) are essentially the common factor across all the companies stock returns with the idiosyncratic effects discounted, so any variables that explains this common component would be the systematic risk (think why!). Now we can relate it to the debate between the CAPM and APT guys. If the CAPM guys were correct, I would obtain no additional explanation in my model after adding the macroeconomic variables i.e their assertion that the market risk (market beta) capture the entire systematic risk holds true.

The results, however, suggest that in 7 out of 10 regressions there is statistically additional explanation offered by the macroeconomic variables. Well, so we can out-rightly reject the applicability (of the much prevalent) CAPM in the case of Indian equities. Or is there something amiss? Now if I closely look at the absolute increase in the explanatory power by looking at the Adjusted-R-squared values before and after the addition of the macroeconomic variables, the absolute increase in all the cases is < 1% (refer to pg. 11 in the paper at the end of this post). Therefore, although we obtain statistical efficiency after the addition of the variables, the economic efficiency (intuition) is called to question. Is it worth while to complicate our model with additional macroeconomic variables, when we can simply have the market rate used as a reasonable proxy for all the variables? And all this just to prove a point that we have macro-variables that can provide 0.5% additional explanation in our model? This takes us back to the eternal debate of statistical vs economic efficiency, what is more important? Is the above result robust enough (on economic intuition) to question the much used, simple and powerful CAPM? Is there a threshold even in statistical efficiency to ensure economic efficiency? These are some questions that still linger on in my mind.

If we view the above result with this caveat of economic efficiency then there is reason for us to believe that a single factor model would be a preferable way to model stock returns. There are, however, evidences in the literature to suggest that multi factor (APT) is a superior way of modelling returns, but the identification of these "multi factors" remains a contentious issue among the researchers. In some desperate attempts to refute CAPM, researcher extracted principal components from a number of macroeconomic variables as the input to the PCA. This resulted in factors that had no economic intuition at all, that were then used as independent variables in explaining the returns. The APT (Arbitrage pricing theory) is a 'theory', whereas CAPM is a 'model' that approximates reality. So even if in reality there are multiple factors that give rise to the returns signals as we see them, the identification of these factors is not a trivial exercise as we have seen above. Statistically we managed to overturn the CAPM in the context of Indian equity markets but in term of economic intuition the results do not seem to be that promising. Therefore, the above exercise tells us exactly why people still stick to the evergreen CAPM as an asset pricing model.

In case you wish to replicate the exercise the data can be obtained from here: Returns_CNX_500Nifty_returnsMIBORExchange_rates.

Here is the full text of my paper. Feedback are welcome. 

Thursday, 8 December 2011

Movement around the mean "Stationary" OR "Unit root"


The idea of modelling the time series of GNP, and other macroeconomic variables, data for US as a trend stationary (TS) process was brought into question by Nelson and Plosser in their groundbreaking research paper in 1982. Their research paper marked a paradigm shift in the way time-series econometrics was done post the 80's. The profound idea that prompted them to look for an alternative to the prevalent TS process, was that the series of GNP does not have any tendency to return back to a time trend following a shock. This means that following a shock (for example technological innovations), the series keeps moving away from the time trend rather than return back to it. If the series keeps moving away from the time trend, movements of the series would not be captured by a trend-stationary model.

This marked a radical change which transformed the idea of stationarity to include another class of processes, difference stationary (DS) processes. More on this in my previous post. But as a student of basic time series the phenomenon of non-stationarity was not very easy for me to digest. Does it mean that if a series fluctuates around a mean, is it necessarily stationary? The answer happens to be No (now that I have completed the course I can proudly and confidently answer that question). According to the definition of stationarity, a series is stationary if any group of consecutive data points in the series, have the same mean. Sounds confusing? Let me illustrate this using the example of 2 Indian macro series and some R codes. The daily 3-month MIBOR rates and the daily INR/USD exchange rates for the past 10 years.

###############################
# Access the relevant files ###
###############################
mibor <- read.csv("MIBOR.csv", na.strings="#N/A")
exchange <- read.csv("Exchange_rates.csv", na.strings="#N/A")
nifty <- read.csv("Nifty_returns.csv")

#################################
## Dealing with missing values ##
#################################

## Dealing with blanks in the MIBOR rates ##

mibor[, 2] <- approx(as.Date(mibor$Dates, '%d-%b-%y'), mibor[ ,2], as.Date(mibor$Dates, '%d-%b-%y'))$y
for(k in 2:nrow(mibor))  # Calculating the %age change
{
  mibor$Change1 <- diff(mibor$MIBOR) / mibor$MIBOR[-length(mibor$MIBOR)]
}

## Dealing with blanks in the exchange rates ##

exchange[, 2] <- approx(as.Date(exchange$Year,'%d-%b-%y'), exchange[ ,2], as.Date(exchange$Year, '%d-%b-%y'))$y
exchange$Change <- as.numeric(exchange$Change)
for(j in 2:nrow(exchange)) # Calculating the %age change
{
exchange$Change <- diff(exchange$Exchange.rates)/exchange$Exchange.rates[-length(exchange$Exchange.rates)
}

## Plotting the variables ##

png("indep_var_ns.png", width = 480, height = 480)
par(mfrow = c(2, 1))
plot(as.Date(mibor$Dates,'%d-%b-%y'), mibor$MIBOR, xlab= "Date", 
     ylab= "3-month MIBOR rates (%age)", type='l', col='red', 
     main="3-month MIBOR rates")
abline(h = 0, lty = 8, col = "gray")
plot(as.Date(exchange$Year, '%d-%b-%y'), exchange$Exchange.rates, xlab= "Date", 
     ylab= "IND/USD Exchange rates", type='l', col='red', 
     main="IND/USD Exchange rate")
abline(h = 0, lty = 8, col = "gray")
dev.off()

Eyeballing the above plots one can see that the series do not have any trend in them, as in the series are moving more of less about a mean. But if we look at the MIBOR for example, the mean of the series is different in the period 2000-02 and different for 2003-04. This is the catch here, which I think is quite probable to be overlooked by many. A unit root would also cause long forays away from the mean, so to conduct a test for non-stationarity we shall check if the above series has a unit root in the auto-regressive (AR) polynomial using the ADF test. And now that we can see that the mean is changing substantially over the time horizon, we would expect there to be a unit root in the series. Let us see what the results have to show.

> adf.test(exchange$Exchange.rates)
Dickey-Fuller = -1.9266, Lag order = 13, p-value = 0.6094 ## Cannot reject the null of non-stationarity
alternative hypothesis: stationary
> adf.test(mibor$MIBOR)
Dickey-Fuller = -2.1925, Lag order = 13, p-value = 0.4968 ## Cannot reject the null of non-stationarity
alternative hypothesis: stationary
> adf.test(nifty$S...P.Cnx.Nifty)
Dickey-Fuller = -11.8633, Lag order = 13, p-value = 0.00 ## Can reject the null of non-stationarity
alternative hypothesis: stationary

So we see that the null of unit root cannot be rejected for MIBOR and INR/USD, but the null is rejected for NIFTY returns. Why its rejected for NIFTY is because the fluctuations around the mean are of a very high frequency, so even if we took 2 different time periods the statistical difference between their means would be negligible. Thus the NIFTY returns gives us a stationary series. MIBOR and INR/USD series are also made stationary by taking first difference of the series. The stationary plot look like:

## Plot for the %age changes of the variables:
png("indep_var.png", width = 480, height = 480)
par(mfrow = c(3, 1))
plot(as.Date(mibor$Dates,'%d-%b-%y'), mibor$Change1, xlab= "Date",
     ylab= "Change in 3-month MIBOR(%age)", type='l', col='royalblue',
     main="%age change in MIBOR rates")
abline(h = 0, lty = 8, col = "gray")
plot(as.Date(nifty$Date,'%d-%b-%y'), nifty$S...P.Cnx.Nifty, xlab= "Date",
     ylab= "NIFTY returns(%age)", type='l', col='royalblue',
     main="NIFTY returns")
abline(h = 0, lty = 8, col = "gray")
plot(as.Date(exchange$Year, '%d-%b-%y'), exchange$Change, xlab= "Date",
     ylab= "IND/USD Exchange rates change(%age)", type='l', col='royalblue',
     main="IND/USD Exchange rate changes(%age)")
abline(h = 0, lty = 8, col = "gray")
dev.off()

So there are 2 takes from the exercise above (1) Series fluctuating about a mean need not necessarily be stationary (empirically shown) (2) 3-month MIBOR and INR/USD exhibit unit roots in the given (10 year daily) sample for India. The first point might be a trivial statement for advanced econometricians, but for the novice and the amateurs I think this would serve as a good basic exercise.

In case you wish to replicate the exercise, data can be obtained from here: MIBORINR/USDNIFTY. 


Monday, 14 November 2011

Create your own Beamer template

For the past couple of days, I had been searching for a tutorial that would show how to create a custom Beamer template. I found some great resources and some really great customized templates (I have listed the ones that I referred to below) but none indicated how should I go about it. There was just lines and lines of code and for someone like me, who is programmatically challenged, the task of creating my own template seemed very daunting. So I slowly started reading some of code that I found and tried to made sense of. I did eventually succeed but not entirely.

I did manage to create my own template but is relatively much simpler as compared to the ones that I saw over the internet. But this did get me started, so I am hoping that anybody who's looking for a decently documented procedure on the topic would find this post helpful. The code is relatively simple, both to understand and execute (I hope).

One of the best documents to understand Beamer is the Beamer user guide. It explains the Beamer mechanisms very well and I used it as one of the major references while preparing this template. According to the guide, Beamer presentations have five flavours of themes

  1. Presentation themes - Every aspect of the presentation is detailed here. The colour, the font, the way the bullets look. The way the enumeration goes. The size and position of the logo etc.
  2. Colour themes - Just of the colour details. This can be created as a separate file from the presentation theme and then later called with other presentation themes. I will explain this below. In fact, let's just enumerate the other three and jump to the explanation of how all this connects.
  3. Font themes
  4. Inner themes - Design the elements that are "inside" the frame like the environments, theorems, blocks etc.
  5. Outer themes - Design the outer space of the frame like the headline, footline, sidebar etc. 
To create a Beamer theme we need to specifiy four types of details-- colour, font, inner, outer. These details can be mentioned in the presentation theme itself or can be created as different files and then called in the TeX document. It is actually advisable to create them as separate files since this will allow us to use these specific themes with other presentation themes as well by just calling these in the TeX document. Additionally, it is also a more efficient way of working. For example, let's say that we like the overall feel of the Pittsburgh theme but just want to change the colours from blue to black. Then we need to just create the colour theme file and call it in the TeX document. This wouldn't be possible if we had just created one file with all the details. Let's just try to create a simple theme and see how it goes.

%%%%%%%%%%-------------------------------------------%%%%%%%%%%%%
%                             File created: 9 Nov, 2011
%
% This beamer style file was created just for experminetation and learning.
% The file is pretty much self-explanatory and as of 10 Nov, 2011, pretty much bug free (This is because I really don't know what "bugs" are and/or how to find them.
% Howvere, in case you find and bugs, issues, or have any suggestions/comments, please feel free to contact me at programming-r-pro-bro.blogspot.com
%
%
%
%%%%%%%%%%-------------------------------------------%%%%%%%%%%%%


% This style file is a combination of the four files required to create the Moo beamer theme.

% 1. Inner
% 2. Outer
% 3. Color
% 4. Font
% You can directly use this file instead of the using the other theme file "beamerthemeMoo-whole.sty" and calling the 4 styles in the TeX document.




%%% Defining the preamble
\mode<presentation>
\usepackage{pgfcomp-version-0-65}
\usepackage{color}


%%%%%%%%%%%%%%
%%
%% Color theme
%%
%%%%%%%%%%%%%%




%%%%%%
% We need to define a set of colours that will be assigned to various parts of the presentation. I personally believe that Beamer already does have some great themes, so if we really want to build our own, we will have go to the tiniest of details and tweak them.
%%%%%%





%%%%%%
% LaTeX, like R, has a huge set of colours to choose from and there are many ways we can access them.
% Colours in LaTeX are provided by the xcolor pacakge that loads by default. However, the xcolor package by itself does not provide enough breadth in colours.


% More colours can be called by using the "dvipsnames" options while calling beamer in the preamble of the document, e.g., \documentclass[xolor = dvipsnames]{beamer}



% Other than that we can also blend differnt colours to achive the desired colour mix or simply use the RGB codes to call that particular color.


% For blending two colours, we need to specify the two colours (Duh-uh!) and the percentage share of the two colours in the folowing format: <color1>!<percentage of color1>!<color2>
% If percentage of color1 is specified as x%, then percentage of color2 will automatically be taken as (100 - n)%.




% Why don't we try out a few options and see for ourselves.




% Here we are defining only two colours. We are primarily only going to use the first one.
\definecolor{Ftitle}{rgb}{0, 0, 0} % (rgb - 0, 0, 0) is nothing but black
% Here, we are using "rgb" in small case and this notation for decimal values of "rgb" ranging from 0 to 1

% To specify the integer values of "RGB", ranging from 0 t0 255, we need to use the "RGB" in CAPS or UPPERCASE
% Also, the first curly braces include the name that we assign to the colour combination
\definecolor{Descitem}{RGB}{0, 0, 139}

\definecolor{StdTitle}{RGB}{26, 33, 141}
\definecolor{StdBody}{RGB}{213,24,0}

\definecolor{AlTitle}{RGB}{255, 190, 190}
\definecolor{AlBody}{RGB}{213,24,0}

\definecolor{ExTitle}{RGB}{201, 217, 217}
\definecolor{ExBody}{RGB}{213,24,0}




% Another color for the background canvas using the blending option
%\definecolor{BgShade}{red!30!white}
%%% Important note: While trying this, I found out that a colour cannot be defined this way, it can only be set or called for a particular feature using the \setbeamercolor{}{} command. We will use this below to show this.


%%%%%%
% Assign colours to different constitutens of the presentation as per the requriements.
%%%%%%



% This sets the colour of the title of the presentation and titles of all the slides in the presentation to black.
\setbeamercolor{frametitle}{fg = Ftitle}
\setbeamercolor{title}{fg = Ftitle}

% In case you choose to display the Table of Contents, or the Outline slide.
\setbeamercolor{section in toc}{fg = Ftitle}
\setbeamercolor{section in toc shaded}{fg = Ftitle}

% The colour of all the items, subitems and and subsubitems are set to black.
\setbeamercolor{item}{fg = Ftitle}
\setbeamercolor{subitem}{fg = Ftitle}
\setbeamercolor{subsubitem}{fg = Ftitle}

% This sets the color for each item heading of the description environment.
\setbeamercolor{description item}{fg = Descitem}

% NOTE: Setting the color black for all the items also sets in black for other environemnts like enumerate.

% We also need to fix the colours for captions for figures and tables.
\setbeamercolor{caption}{fg = Ftitle}
\setbeamercolor{caption name}{fg = Ftitle}

% In addition, we can also change the background colour of the slides depending on pur requirement.
% \setbeamercolor{background canvas}{bg = blue!5}
% We have commented out this command because this is just for illustrative purposes and has not been used to define the background colour of the slides.




%%%%%%
% Now, there are three types of boxes in beamer:
% 1. Simple, or standard block, which can be invoked using definition or theorem
% 2. Alert block
% 3. Example block
% We will customize all these blocks based on our requirements


% Standard block
\setbeamercolor{block title}{fg = Descitem, bg = StdTitle!15!white}
\setbeamercolor{block body}{bg = StdBody!5!white}

% Alert block
\setbeamercolor{block title alerted}{bg = AlTitle}
\setbeamercolor{block body alerted}{bg = AlBody!5!white}

% Example block
\setbeamercolor{block title example}{bg = ExTitle}
\setbeamercolor{block body example}{bg = ExBody!5!white}



%%%%%%
% And one final thing, the colour of the text
\setbeamercolor{normal text}{fg = Ftitle}






%%%%%%%%%%%%%%
%%
%% Font theme
%%
%%%%%%%%%%%%%%


%%%%%%
% Here we are using default fonts
\usefonttheme{professionalfonts}

% Font for the presentation title
\setbeamerfont{title}{size = \huge}

% Font of the frame titles
\setbeamerfont{frametitle}{size = \Large}


%%%%%%%%%%%%%%
%%
%% Inner theme
%%
%%%%%%%%%%%%%%


%%%%%%
% Here we am using the rounded theme for the overall "feel" of the presentation. You can change the specific details by editing that particular option as we have done below.
\useinnertheme{rounded}

% Instead of rounded circles, we will use triangles as the indicator for items.
\setbeamertemplate{itemize items}[triangle]

% The default option for enumerate environment removes the circles around the numbers provided by the "rounded" inner theme. Just simple numbers remain.
%\setbeamertemplate{enumerate items}[default]


%%%%%%%%%%%%%%
%%
%% Outer theme
%%
%%%%%%%%%%%%%%


%%%%%%
% The outer theme takes the most amount of effort and time to customize.


%%%%%%
% In the outer theme, we will try to do the following:
% 1. Change the headline by putting a logo and a horizontal line
% 2. Change the footline and include custom information depending on our requirements
% 3. Organize the presentation title and the frame titles


%%%%%%
% Let's start with the headline. The approach that we plan to take for the headline and footline is similar.
% We will first define a new command and then include the command in the \setbeamertemplate{} option.
% I tried a couple of approaches that would make the method more simple but could not come up with one. % In case you do find a more aesthetic approach, please do send it accross. The contact information is on the top of the page.
% First the horizontal line on the top portion of the slides
% Add a horizontal line that runs from left of the slide to the right, just below the logo.
\newcommand{\LogoLine}{%
\raisebox{-12mm}[0pt][0pt]{%
\begin{pgfpicture}{0mm}{0mm}{0mm}{0mm}
\pgfsetlinewidth{0.28mm}
\color{gray}
\pgfline{\pgfpoint{-3mm}{1mm}}{\pgfpoint{10.8cm}{1mm}}
\end{pgfpicture}}}


% Include the line that we just created in the headline
\setbeamertemplate{headline}[text line]{\LogoLine}


% Now the logo. As it turns out. I could not include the logo in the headline. If I tried to, the headline kept shifting downwards.
% Acting a little smart and lazy, I just included the logo in the right sidebar and shifted it up.
\setbeamertemplate{sidebar canvas right}{
\vspace*{3pt}\hspace*{-25pt}%
{\includegraphics[height=28pt]{moo.png}}}





%%%%%
% Now that we have changed the headline, we will need to orient the frame titles in a way that the come at the right stop just above the horizontal line.
\setbeamertemplate{frametitle}{
\vspace*{4mm}\hspace*{-2mm}\insertframetitle}


%%%%%
% Like mentioned above, we will take a similar approach to customize the footline as well and inclued FAA in it.
\newcommand{\Ffootline}{%
\insertsection % The left end of the footline
\hfill
\textit{Moo} % The center
\hfill
\insertframenumber/\inserttotalframenumber} % And the right end



\setbeamertemplate{footline}{%
\usebeamerfont{structure}
\begin{beamercolorbox}[wd=\paperwidth,ht=2.25ex,dp=1ex]{title in head/foot}%
\Tiny\hspace*{4mm} \Ffootline \hspace{4mm}
\end{beamercolorbox}}
%%%%%%
% We will also remove the navigation symbols, which I personally don't find very useful
\setbeamertemplate{navigation symbols}{}
%%%%%%
% Now the toughest part--at least for me--customizing the title page.
% Putting a logo on the title page and text beside it was quite a difficult task and to be fairly honest, my code is not effecient at all. At the very least, it is plain clumsy.
% Though it does solve the purpose (I hate to use this phrase) it is neither "neat" nor "cool".
% Anyway, let's see how we went about it.
% For the title page, we needed a logo on the left, a vertical separater line, and finally a place for the title, author, date etc.



% First, let's create the line
\newcommand{\TitleLine}{%
\raisebox{-12mm}[0pt][0pt]{%
\begin{pgfpicture}{0mm}{0mm}{0mm}{0mm}
\pgfsetlinewidth{0.10mm}
\color{gray}
\pgfline{\pgfpoint{55mm}{0mm}}{\pgfpoint{55mm}{50mm}}
\end{pgfpicture}}}



% Now let's create commands for the title etc., that we can call later

% Title
\newcommand{\MyTitle}{%
\hspace*{60mm}\vspace{-25mm}
\centering \inserttitle}

% Subtitle
\newcommand{\MySubTitle}{%
\hspace*{60mm}\vspace{-25mm}
\centering \footnotesize \textit{\insertsubtitle}}

% Author
\newcommand{\MyAuthor}{
\hspace*{60mm}\vspace{-25mm}
\centering \insertauthor}

% Institute
\newcommand{\MyInstitute}{
\hspace*{60mm}\vspace{-25mm}
\centering \footnotesize \textit{\insertinstitute}}

% Date
\newcommand{\MyDate}{
\hspace*{60mm}\vspace{-25mm}
\centering \insertdate}



% We declare the image that will be used as the logo
\pgfdeclareimage[width = 0.20\paperwidth]{big}{moo.png}



% This is quite a complicated command. We basically create a "beamercolorbox" for each field and invoke the commands that we had created earlier.
\setbeamertemplate{title page}{\TitleLine
\hspace*{11mm}\vspace*{-60mm}
\begin{beamercolorbox}[wd=0.5\paperwidth,ht=0.13\paperwidth]{Title}
\pgfuseimage{big}
\end{beamercolorbox}
%
\begin{beamercolorbox}[wd=\paperwidth,ht=0.06\paperwidth]{Title}
\usebeamerfont{Title}%
\MyTitle
\end{beamercolorbox}
%
\begin{beamercolorbox}[wd=\paperwidth,ht=0.03\paperwidth]{Title}
\usebeamerfont{Title}%
\MySubTitle
\end{beamercolorbox}
%
\begin{beamercolorbox}[wd=\paperwidth,ht=0.06\paperwidth]{Title}
\usebeamerfont{Title}%
\MyAuthor
\end{beamercolorbox}
%
\begin{beamercolorbox}[wd=\paperwidth,ht=0.03\paperwidth]{Title}
\usebeamerfont{head/foot}%
\MyInstitute
\end{beamercolorbox}
%
\begin{beamercolorbox}[wd=\paperwidth,ht=0.07\paperwidth]{Title}
\usebeamerfont{Title}%
\MyDate
\end{beamercolorbox}}


\mode
<all>

Well, this pretty much completes the creation of Moo. I will upload the Sty file, TeX file and sample pdf for direct use and references. 

Also, in case you are trying to create a more complicated theme you should definitely check out some of the resources mentioned below.
  1. Most important - The Beamer user guide
  2. A great repository of themes here
  3. A great introduction to Beamer here
  4. Some information that helped
  5. Another custom theme here
  6. A very good Beamer example
Hope this information helps.

The uploaded file

Style file

TeX File

PDF output

Saturday, 5 November 2011

Unit root versus breaking trend: Perron's criticism

I came across an ingenious simulation by Perron during my Time-series lecture which I thought was worth sharing. The idea was to put your model to a further test of breaking trend before accepting the null of unit root. Let me try and illustrate this in simple language.

A non-stationary time series is one that has its mean changing with time. In other words, if you randomly choose a bunch of values from the series from the middle, you would end up with different values of mean for different bunches. In short there is a trend in the data which needs to be removed to make it stationary and proceed with our analysis (its far easier to work with stationary timeseries). In order to deal with non-stationary time-series one has to be careful about the kind of non-stationarity that is exhibited by the variable. Two corrections for non-stationarity include fitting

(1) Trend stationary (TS) models, which are suitable for models that have a deterministic trend and fluctuations about that deterministic trend. This can be fit by a simple zt = a + bt + et where et ~ ARMA(p,q)

(2) Difference stationary (DS) models, which are suitable for models having a stochastic trend. The DS models are appropriate for models that have a unit root in the AR polynomial. Unit root in the AR polynomial means that the trend part in the series cannot be represented by a simple linear trend with time (a + bt). And the correct representation is (1 – B)zt = a + et, where et is i.i.d. 

The asymptotic properties of the estimates, forecasts and forecast errors vary substantially between the TS and DS models. (For the ones interested in the algebra behind this, lecture notes of Dr. Krishnan are here) Therefore it is important for us be sure that the model belongs to the appropriate class before we fit a TS or DS model. This is the reason why the clash between the two school of thoughts has bred enormous literature and discussions on the methodology to check for unit roots. One could try and endlessly argue about these discussions but I want to illustrate the genius of Perron who criticized the idea of fitting a DS model to series that could have a structural breaks. He said that you ought to take into account the structural break before you check for the unit roots, if you don't do so, you might end up accepting the null of unit root, even when the true data generating process (DGP) is a trend stationary process. He illustrated this using a simple, but very elegant, simulation exercise. Madhav and I, along with fine-tuning on the codes provided by Utkarsh, replicated this exercise with R.

The steps involved are as follows:
(1)   Simulate 1000 series with the DGP as:
 z­­t = u1  + (u2 – u1)DUt + bt + et  
where et ­are i.i.d innovations and t = 1,2,3,...100. For simplicity I have assumed b = 1 and u1 = 0.
(2)   Assume that there is a crash at time Tb = 50 and the entire series comes down by amount u2.

## Simulating a trend stationary model with a crash in the intercept ##
t <- c(1:100) # specifying the time
dummy <- as.numeric(ifelse(t <= 50, 0, 1)) # specifying the dummy for trend break at T = 50

z <- ts(t - 15*dummy + rnorm(100, mean = 0, sd = 3))# This is the trend stationary model with break in trend
x <- ts(t - 15*dummy) # This is just the trend line that we see in "red" in the plot below

plot(z, main = "Simulated series with break at T = 50")
lines(x, col = "red") ## Plotting a sample of the model that we have simulated


(3)   For these simulations compute the autoregressive coefficient, “rho” in the regression:
zt = u + bt + ‘rho’zt-1 + et
(4)   Plot the cumulative distribution function (c.d.f) of “rho” for different values of u2 (crash).

## Now we will simulate the sample data above 1000 times and check for unit roots for each of these samples ##

# For simplicity we define a function to generate the "rho's" for each of the simulated series

sim <- function(crash) ## Function name "sim"
d <- ts(t - crash*dummy + rnorm(100, mean = 0, sd = 3))
## saving the simulated series in "d"
trend <- lm(d ~ t) ## remove the trend from the
simulated series

# crash in the above function refers to the value of u2 in equation 1

res <- ar(ts(trend$residuals), order=1, aic= FALSE) ##
Fit an AR(1) model to the residue obtained after
detrending the series 

if(length(res$ar) < 1) 0 else res$ar ## Return the ar
coefficient of the fitted AR(1) model above.
}

## Generate "rho's" for different magnitude of crash by
simply using the sim() function defined above

temp1 <- replicate(n, sim(10))
temp2 <- replicate(n, sim(15))
temp3 <- replicate(n, sim(20))
temp4 <- replicate(n, sim(35))

## Sort the values of "rho", we do this to plot the CDF
as we will see shortly

temp1.1 <- sort(temp1)
temp2.1 <- sort(temp2)
temp3.1 <- sort(temp3)
temp4.1 <- sort(temp4)
y <- seq(from=0, to=1, length.out=n)## This is how I
define the y-axis of my CDF which are basically the
probabilities. 

## Plotting all the CDF of rho for different magnitude in one plot.

   plot(c(min(temp1.1), max(temp4.1)), c(0, 1), type='n',       xlab = "Rho", ylab= "Probability", main = "CDF of 'Rho' for   differnt magniturde of crashes")
   lines(temp1.1, y, type = 'l', col = 'red')
   lines(temp2.1, y, type = 'l', col = 'green')
   lines(temp3.1, y, type = 'l', col = 'blue')
   lines(temp4.1, y, type = 'l', col = 'black')
   b <- c("10 unit crash", "15 unit crash", "20 unit crash", "35 unit crash")

   legend("topleft", b , cex=0.5, col=c("red", "green", "blue", "black"), lwd=2, bty="n")







An interesting observation that we make (or rather Perron made) is that the c.d.f of our autoregressive coefficient “rho” tends more towards unity with increase in the magnitude in crash. What this means is that as the magnitude of crash increases the possibility of your accepting the (false) null of unit root increases. Why I say the false null is because I know the true DGP is a trend stationary one.

This idea of Perron was criticised on the ground that he was specifying the break point (Tb) exogenously, that is from outside the DGP. Frankly speaking I do not understand why was this taken as a criticism. I think fixing the break point exogenously was a good way of fixing it with an economic intuition and not making is a purely statistical exercise. Some researchers (I don’t understand why) termed this (simulation) illustration as a “data mining” exercise, and improved it by selecting the break point (Tb) endogenously (by Zivot and Andrews as mentioned in the lecture notes).

I would hate to impose my opinion here but I feel this was a very elegant and logical way of driving home the point that the null of unit root should be accepted for your sample if and only if your model stands the test of extreme rigour and not otherwise, and the rigour could be imposed exogenously with economic intuition too.

P.S. Perron did a similar simulation for breaking trend model, i.e where the slope of the model had a structural break. The codes would be quite similar to the ones given above, in fact it would be a good practice if you could do the similar simulation for a breaking trend. In case you do want to try but face any issues please feel free to post/email your queries.

Criticism and discussions welcome.