We think therefore we R: Predictability of stock returns : Using runs.test()

Friday, 21 October 2011

Predictability of stock returns : Using runs.test()

Financial market is interesting place, you find people taking positions (buying/selling) based on their expectations of what the security prices would be and are rewarded/penalized according to the accuracy of their expectations. The beauty of financial markets is that it provides a platform for everyone to come in with their respective expectations and allows them to interact and exchange securities. I emphasize on everyone because this everyone includes a auto-rickshaw driver, a clerk and also sophisticated econometricians and analysts. An obvious point then is that if your expectations are consistently correct, i.e you can predict the price movements before it happens on the exchange, you are a rich man. Assuming for all practical purposes that there is no oracle in our universe, who can do these predictions with 100% accuracy, the job of this prediction rests upon an econometrician/statistician. Lets see if they can do a good job too.

I took the stock returns data for INFOSYS (INFY on NSE) for the past one year and tried to see if I could make this data confess its underlying linear/non-linear generating process. I started by employing a rather simple, straight forward and easy to interpret Runs test. Its a non-parametric statistical test that will test the null hypothesis of whether the underlying series is identical and independent distributed. For those who are not too familiar with statistical parlance, non-parametric in simple term means that we have to make no assumptions about what the underlying data should be like. There is a huge surge in the applications of non-parametric statistics to explain various processes, this is because the biggest deterrence to conducting these kinds of tests, i.e the computational issues, are no longer a problem in this generation of rapid computation. The idea of empirical analysis is about trying to theorize a null hypothesis and then try your best to bring it down using empirical evidence. (analogous to Karl Popper's idea of falsification of a theory, you hang on to a theory so long as it has not betrayed you yet)

## Doing runs test on INFY daily returns
> infy <- read.csv("01-10-2010-TO-01-10-2011INFYEQN.csv") ## Reading the stock price data

> infy_ret <- 100*diff(log(infy[,2])) ## Since the second column in the data has the stock prices I have used [log(Pt) - log(Pt-1)]*100 as the returns.

> runs.test(factor(infy_ret > 0)) ## what this has done is that it has created a category variable that takes value 1 of infy_ret > 0 and 0 otherwise.

What this does is that tells me whether the runs of the returns are predictable, i.e say if I represent possitive return by + and negative return by - then my series of returns would probably look like +,+,-, +, -, -, -, +, ...
now that this test check is can I predict whether the next day will have + or -

Output:
Runs Test
data: factor(infy_ret > 0)

Standard Normal = 0.1308, p-value = 0.8959  ## High p-value means you cannot trash your null hypothesis.

For those not familiar with statistics, the p-value is nothing but the probability of you reject a null hypothesis when it is actually true. So in simple words it gives me the probability that I might end up rejecting a correct null hypothesis. (be very careful with the interpretation of p-value, many times people end up misunderstanding it, many a times even I have fallen prey to this). Therefore you cannot reject your null hypothesis under such a high probability of committing this error or wrongly rejecting a correct hypothesis , you just don't have enough evidence. Therefore your series is a random walk (you can understand this in the literal English language sense, but the definition is not so trivial in time series parlance).

P.S In case you want to replicate this exercise the data can be obtained from here.

6 comments:

Anonymous22 October 2011 at 00:20
Interesting post, but:
The p value is not "the probability of [rejecting] a null hypothesis when it is actually true"

...It is the probability of obtaining the data assuming the null hypothesis is true.
ReplyDelete
Replies
musically_ut22 October 2011 at 18:49
Dear Anonymous,

I think that the statement in the post meant

you cannot reject your null hypothesis under such a high probability of obtaining the test statistic as extreme as for this data

This was paraphrased as committing an error [by assuming that the test-statistic is unlikely].

Also, saying that p-value is the probability of obtaining the data assuming the Null hypothesis is true is not completely correct. You probably meant obtaining the test statistic at least as extreme as for this data instead of just the data.

~
ut
ReplyDelete
Replies
Shreyes23 October 2011 at 21:27
Thanks for helping me out here Utkarsh. :-)

Dear Anonymous,

As I had mentioned above the interpretation of p-value still remains an elusive proposition even for the statisticians.

But I agree with Utkarsh, I believe what you were trying to say was "obtaining the test statistic as extreme as for the given data" and not the data.
ReplyDelete
Replies
tony6 July 2012 at 03:32
sir, i have a problem to discuss, doing research now but but don't know the econometric concepts i am student of finance checking the randomness in the data series using runs test and ADF test but don't known how to interpret it plz help me out i am posting the results of the tests, suggest me what the test result state? and tell me why we take first difference in these tests?

Runs test with first difference

Runs test (first difference)

Number of runs (R) in the variable 'Close' = 64
Under the null hypothesis of independence, R follows N(67.1818, 5.51174)
z-score = -0.577281, with two-tailed p-value 0.56375

run test assuming positive and negative are equioprobable(without difference)

Runs test (level)

Number of runs (R) in the variable 'Close' = 1
Under the null hypothesis of independence and equal probability of positive
and negative values, R follows N(73, 5.97913)
z-score = -12.0419, with two-tailed p-value 2.14009e-033

ADF without taking difference
Augmented Dickey-Fuller tests, order 1, for Close
sample size 142
unit-root null hypothesis: a = 1

test with constant
model: (1 - L)y = b0 + (a-1)*y(-1) + ... + e
1st-order autocorrelation coeff. for e: 0.007
estimated value of (a - 1): -0.0149455
test statistic: tau_c(1) = -1.16104
asymptotic p-value 0.6934

with constant and trend
model: (1 - L)y = b0 + b1*t + (a-1)*y(-1) + ... + e
1st-order autocorrelation coeff. for e: 0.007
estimated value of (a - 1): -0.0431525
test statistic: tau_ct(1) = -1.79382
asymptotic p-value 0.7081

ADF with first difference

Augmented Dickey-Fuller tests, order 1, for d_Close
sample size 141
unit-root null hypothesis: a = 1

test with constant
model: (1 - L)y = b0 + (a-1)*y(-1) + ... + e
1st-order autocorrelation coeff. for e: 0.004
estimated value of (a - 1): -0.968427
test statistic: tau_c(1) = -8.49099
asymptotic p-value 1.727e-014

with constant and trend
model: (1 - L)y = b0 + b1*t + (a-1)*y(-1) + ... + e
1st-order autocorrelation coeff. for e: 0.004
estimated value of (a - 1): -0.969023
test statistic: tau_ct(1) = -8.46566
asymptotic p-value 5.077e-014
expecting your quick reply
ReplyDelete
Replies
NSE BSE Tips4 July 2014 at 18:28
The post has made me realize that how complicated it is to predict the stock market movements.
ReplyDelete
Replies