Monday 26 December 2011

Modelling returns using PCA : Evidence from Indian equity market

As my finance term paper, I investigated an interesting question where I tried to identify macroeconomic variables that explain the returns on equities. Much of the debate has already taken place on this topic which has given rise to two competing theories of asset pricing viz. CAPM (capital asset pricing theory or single factor model) and APT (arbitrage pricing theory or multi-factor model). Here is a brief discussion on the two in my previous post. In this post I would like to discuss my approach to answering this question in the context of Indian stock market.

Methodology:
  • Companies that have been actively traded on NSE stock exchange for the past 10 years (218 companies) were selected and their daily stock returns data for these 10 years was taken from PROWESS. 
  • Using PCA, first 10 components from the returns data of the 218 companies was extracted. More on PCA in my previous post, here
  • These components were then separately regressed first on NIFTY returns (first regression) 
  • Then these components were regressed on NIFTY returns, MIBOR rate changes, and INR/USD exchange rate changes (second regression).
  • The explanatory power of the 2 regressions were compared using a F-statistic. (refer to pg. 10 in the paper attached in the end of the post)

Findings and R codes:
We start with calculating the PCA of the returns on the 218 companies daily return data, then employing the 2 regressions, then comparing the 2 regressions using a F-statistic. F-stat tells us if there is any additional explanation offered when we include macroeconomic variables (viz. MIBOR, INR/USD) in our equation.



The results that I obtained pose an interesting observation. We find that the F-stat is significant at 5% for 7 out of the 10 regressions, meaning that out of the 10 regressions (each regression with a separate component) we find statistically significant addition in the explanatory power of the model after adding the macroeconomic variables. Therefore, on statistical ground I can argue that a multi factor model (APT) is preferable over a single factor model (CAPM) for modelling stock returns in the case of Indian equity market. This assertion, if holds true, can have reaching implications for asset pricing for Indian securities. Let me explain why. The principal components (that are the dependent variables in the model) are essentially the common factor across all the companies stock returns with the idiosyncratic effects discounted, so any variables that explains this common component would be the systematic risk (think why!). Now we can relate it to the debate between the CAPM and APT guys. If the CAPM guys were correct, I would obtain no additional explanation in my model after adding the macroeconomic variables i.e their assertion that the market risk (market beta) capture the entire systematic risk holds true.

The results, however, suggest that in 7 out of 10 regressions there is statistically additional explanation offered by the macroeconomic variables. Well, so we can out-rightly reject the applicability (of the much prevalent) CAPM in the case of Indian equities. Or is there something amiss? Now if I closely look at the absolute increase in the explanatory power by looking at the Adjusted-R-squared values before and after the addition of the macroeconomic variables, the absolute increase in all the cases is < 1% (refer to pg. 11 in the paper at the end of this post). Therefore, although we obtain statistical efficiency after the addition of the variables, the economic efficiency (intuition) is called to question. Is it worth while to complicate our model with additional macroeconomic variables, when we can simply have the market rate used as a reasonable proxy for all the variables? And all this just to prove a point that we have macro-variables that can provide 0.5% additional explanation in our model? This takes us back to the eternal debate of statistical vs economic efficiency, what is more important? Is the above result robust enough (on economic intuition) to question the much used, simple and powerful CAPM? Is there a threshold even in statistical efficiency to ensure economic efficiency? These are some questions that still linger on in my mind.

If we view the above result with this caveat of economic efficiency then there is reason for us to believe that a single factor model would be a preferable way to model stock returns. There are, however, evidences in the literature to suggest that multi factor (APT) is a superior way of modelling returns, but the identification of these "multi factors" remains a contentious issue among the researchers. In some desperate attempts to refute CAPM, researcher extracted principal components from a number of macroeconomic variables as the input to the PCA. This resulted in factors that had no economic intuition at all, that were then used as independent variables in explaining the returns. The APT (Arbitrage pricing theory) is a 'theory', whereas CAPM is a 'model' that approximates reality. So even if in reality there are multiple factors that give rise to the returns signals as we see them, the identification of these factors is not a trivial exercise as we have seen above. Statistically we managed to overturn the CAPM in the context of Indian equity markets but in term of economic intuition the results do not seem to be that promising. Therefore, the above exercise tells us exactly why people still stick to the evergreen CAPM as an asset pricing model.

In case you wish to replicate the exercise the data can be obtained from here: Returns_CNX_500Nifty_returnsMIBORExchange_rates.

Here is the full text of my paper. Feedback are welcome. 

Thursday 8 December 2011

Movement around the mean "Stationary" OR "Unit root"


The idea of modelling the time series of GNP, and other macroeconomic variables, data for US as a trend stationary (TS) process was brought into question by Nelson and Plosser in their groundbreaking research paper in 1982. Their research paper marked a paradigm shift in the way time-series econometrics was done post the 80's. The profound idea that prompted them to look for an alternative to the prevalent TS process, was that the series of GNP does not have any tendency to return back to a time trend following a shock. This means that following a shock (for example technological innovations), the series keeps moving away from the time trend rather than return back to it. If the series keeps moving away from the time trend, movements of the series would not be captured by a trend-stationary model.

This marked a radical change which transformed the idea of stationarity to include another class of processes, difference stationary (DS) processes. More on this in my previous post. But as a student of basic time series the phenomenon of non-stationarity was not very easy for me to digest. Does it mean that if a series fluctuates around a mean, is it necessarily stationary? The answer happens to be No (now that I have completed the course I can proudly and confidently answer that question). According to the definition of stationarity, a series is stationary if any group of consecutive data points in the series, have the same mean. Sounds confusing? Let me illustrate this using the example of 2 Indian macro series and some R codes. The daily 3-month MIBOR rates and the daily INR/USD exchange rates for the past 10 years.

###############################
# Access the relevant files ###
###############################
mibor <- read.csv("MIBOR.csv", na.strings="#N/A")
exchange <- read.csv("Exchange_rates.csv", na.strings="#N/A")
nifty <- read.csv("Nifty_returns.csv")

#################################
## Dealing with missing values ##
#################################

## Dealing with blanks in the MIBOR rates ##

mibor[, 2] <- approx(as.Date(mibor$Dates, '%d-%b-%y'), mibor[ ,2], as.Date(mibor$Dates, '%d-%b-%y'))$y
for(k in 2:nrow(mibor))  # Calculating the %age change
{
  mibor$Change1 <- diff(mibor$MIBOR) / mibor$MIBOR[-length(mibor$MIBOR)]
}

## Dealing with blanks in the exchange rates ##

exchange[, 2] <- approx(as.Date(exchange$Year,'%d-%b-%y'), exchange[ ,2], as.Date(exchange$Year, '%d-%b-%y'))$y
exchange$Change <- as.numeric(exchange$Change)
for(j in 2:nrow(exchange)) # Calculating the %age change
{
exchange$Change <- diff(exchange$Exchange.rates)/exchange$Exchange.rates[-length(exchange$Exchange.rates)
}

## Plotting the variables ##

png("indep_var_ns.png", width = 480, height = 480)
par(mfrow = c(2, 1))
plot(as.Date(mibor$Dates,'%d-%b-%y'), mibor$MIBOR, xlab= "Date", 
     ylab= "3-month MIBOR rates (%age)", type='l', col='red', 
     main="3-month MIBOR rates")
abline(h = 0, lty = 8, col = "gray")
plot(as.Date(exchange$Year, '%d-%b-%y'), exchange$Exchange.rates, xlab= "Date", 
     ylab= "IND/USD Exchange rates", type='l', col='red', 
     main="IND/USD Exchange rate")
abline(h = 0, lty = 8, col = "gray")
dev.off()

Eyeballing the above plots one can see that the series do not have any trend in them, as in the series are moving more of less about a mean. But if we look at the MIBOR for example, the mean of the series is different in the period 2000-02 and different for 2003-04. This is the catch here, which I think is quite probable to be overlooked by many. A unit root would also cause long forays away from the mean, so to conduct a test for non-stationarity we shall check if the above series has a unit root in the auto-regressive (AR) polynomial using the ADF test. And now that we can see that the mean is changing substantially over the time horizon, we would expect there to be a unit root in the series. Let us see what the results have to show.

> adf.test(exchange$Exchange.rates)
Dickey-Fuller = -1.9266, Lag order = 13, p-value = 0.6094 ## Cannot reject the null of non-stationarity
alternative hypothesis: stationary
> adf.test(mibor$MIBOR)
Dickey-Fuller = -2.1925, Lag order = 13, p-value = 0.4968 ## Cannot reject the null of non-stationarity
alternative hypothesis: stationary
> adf.test(nifty$S...P.Cnx.Nifty)
Dickey-Fuller = -11.8633, Lag order = 13, p-value = 0.00 ## Can reject the null of non-stationarity
alternative hypothesis: stationary

So we see that the null of unit root cannot be rejected for MIBOR and INR/USD, but the null is rejected for NIFTY returns. Why its rejected for NIFTY is because the fluctuations around the mean are of a very high frequency, so even if we took 2 different time periods the statistical difference between their means would be negligible. Thus the NIFTY returns gives us a stationary series. MIBOR and INR/USD series are also made stationary by taking first difference of the series. The stationary plot look like:

## Plot for the %age changes of the variables:
png("indep_var.png", width = 480, height = 480)
par(mfrow = c(3, 1))
plot(as.Date(mibor$Dates,'%d-%b-%y'), mibor$Change1, xlab= "Date",
     ylab= "Change in 3-month MIBOR(%age)", type='l', col='royalblue',
     main="%age change in MIBOR rates")
abline(h = 0, lty = 8, col = "gray")
plot(as.Date(nifty$Date,'%d-%b-%y'), nifty$S...P.Cnx.Nifty, xlab= "Date",
     ylab= "NIFTY returns(%age)", type='l', col='royalblue',
     main="NIFTY returns")
abline(h = 0, lty = 8, col = "gray")
plot(as.Date(exchange$Year, '%d-%b-%y'), exchange$Change, xlab= "Date",
     ylab= "IND/USD Exchange rates change(%age)", type='l', col='royalblue',
     main="IND/USD Exchange rate changes(%age)")
abline(h = 0, lty = 8, col = "gray")
dev.off()

So there are 2 takes from the exercise above (1) Series fluctuating about a mean need not necessarily be stationary (empirically shown) (2) 3-month MIBOR and INR/USD exhibit unit roots in the given (10 year daily) sample for India. The first point might be a trivial statement for advanced econometricians, but for the novice and the amateurs I think this would serve as a good basic exercise.

In case you wish to replicate the exercise, data can be obtained from here: MIBORINR/USDNIFTY.