Wednesday, 21 September 2011

Simple time series plot using R : Part 1


As a task for my Financial eco assignment I had to plot a simple time series of the overnight MIBOR(Mumbai interbank offer rates) for the past one year . The job could very well have been done easily in MS-Excel but I choose to plot it in R instead and the quality of the graph, pixel-wise and neatness wise, was way better than what I could have obtained with MS-Excel. All this at the cost of a minimal 3 lines of code:

# The overnight MIBOR rates were stored in a file name "Call_Rates_2011.csv", this is just a normal Excel file saved in a CSV(comma separated delimited) format that R can read.
# The way in which R conceptualizes the data is similar to that in Excel, to draw a simple analogy you can assume that the variable "a" now stores the entire Excel spreed sheet in it.
# You will have to make sure that the working directory is the one that contains the file "Call_Rates_2011.csv"

a <- read.csv("Call_Rates_2011.csv")

# The 2 column headers in my CSV file were "date" and "mibor", so the below code plots "date" on the x-axis and "mibor" on the y-axis. The as.Date() tells R that the column "date" contains dates in the format "day-month-year"('%d-%b-%y').
# a$(column header) is the standard way of referring to a column in the "spreadsheet" contained in "a"
# xlab : x-axis label
# ylab : y- axis label
# type : line(l)
# col : color of the line

plot(as.Date(a$date,'%d-%b-%y'), a$mibor, xlab= "Months", ylab= "MIBOR overnight rates(percentage)", type='l', col='red')

# This is to get the titles in place
# main : main title
# col.main : color of the main title
#font.main : font size of the title

title(main="Overnight MIBOR rates for last one year", col.main="black", font.main=4)

And the plot hence obtained thus looks like:



Incase you can't make out the difference in the quality of the plot obtained just drop in your comments and email address and I will mail you the pdf and the jpg image of the plot. You can pull/stretch it to see that the pixels don't get distorted and it looks way neater if you present it in your slide in a presentation.

17 comments:

  1. I don't know why I have not found this blogspot earlier.

    Very detailed and clear explanations.

    Way to go, Shreyes.

    By the way, are you on twitter, facebook, google+?

    ReplyDelete
  2. Shreyes, I tried the exact same R codes you provided in this tutorial and following is what the console displayed besides not showing any graph:

    Error in plot.window(...) : need finite 'xlim' values
    In addition: Warning messages:
    1: In min(x) : no non-missing arguments to min; returning Inf
    2: In max(x) : no non-missing arguments to max; returning -Inf

    P.S: I used the MIBOR .csv file you provided in one of your other tutorials.

    Thank you.

    ReplyDelete
  3. GUEYE, If you used the MIBOR.csv file provided by me in http://programming-r-pro-bro.blogspot.in/search/label/Principal%20component%20analysis this post, then you would get this error because that file contains "#NA". In the same post I have used the approx() function to replace these "#NA" with a linear interpolation.

    Since R does not know how to deal with these values it assigns garbage values "inf" instead. Try doing this before you go ahead with the plot:

    a[, 2] <- approx(as.Date(a$date, '%d-%b-%y'), a[ ,2], as.Date(a$date, '%d-%b-%y'))$y

    It should work now. Let me know if you still face any problem.

    ~
    Shreyes

    ReplyDelete
    Replies
    1. Hi Shreyes, I got the same problem with my data. I tried your fix code but it pops the error: need at least two non-NA values to interpolate. Any idea?

      Delete
    2. What is a[,2], and what is "$y" in the above example? Can/should the '%d-%b-%y' string be changed to the "%m/%d/%Y %H:%M" string if I'm using the as.POSIXlt example below?

      Delete
  4. Hi Shreyas,
    I am still beginning with R.
    I have a CSV file like this
    time,data
    01/29/2013 19:26:04,110.087103
    01/29/2013 19:28:04,56.978100
    01/29/2013 19:30:04,91.755860
    01/29/2013 19:32:04,66.255792
    01/29/2013 19:34:04,86.740205
    01/29/2013 19:36:04,99.137451
    01/29/2013 19:38:04,68.836168
    01/29/2013 19:40:04,106.748553
    01/29/2013 19:42:04,39.968326
    01/29/2013 19:44:04,61.309700

    How to plot the graph using this data ?

    Also can R take huge CSV files, I have a CSV file, hwich has 10000 lines.

    Thanks

    A

    ReplyDelete
    Replies
    1. Try this:

      x$new_date <- as.POSIXlt(x$Date, format= "%m/%d/%Y %H:%M")

      Since your date is in date time format you need to specify that to R first. See if you are able to plot the values after this.(using the x$new_date instead of x$Date)

      Delete
  5. Thanks Shreyes, I will try this.

    Also, Can R take large sets of CSV data ?

    I am a SAN engineer by profession and the performance data we get is huge.

    I am trying to make the graph generation automated, as I have already arranged the data as CSV using Ruby programming language

    Regards,
    A

    ReplyDelete
    Replies
    1. R does have a shortcoming of using the physical memory to create/dump its objects, having said that I don't think 10,000 rows of data should pose any problem. However, as the size of the data increases, say greater than 40-50Mb, depending on your machine it might start acting funny.

      Revolution R seems to have a package RevoScale that breaks the data into chunks and is proficient when it comes to dealing with large datasets, but that is as far as my knowledge goes about Rev R, I am trying to get familiar with Rev R and am not a pro at that.

      Hope this helps.

      Delete
    2. Hi Shreyes,

      Thanks. I tried the script. though the graph got created, it is not very clear.

      X axis wasn't in the proper format.

      and Lines didn't come properly. Still it is good learning and I want to develop on this.

      Thanks a lot again, your input was really helpful.

      best Regards,

      Athreya

      Delete
  6. Hi Shreyes,

    I got the graph, however X axis is all messed. I think all the characters are overlapped

    d <- read.table("tedorig.txt", sep=",", header=TRUE)
    png(file="latency.png",width=1000,height=350,res=72)
    d$new_date <- as.POSIXlt(d$date, format= "%m/%d/%Y %H:%M")
    plot(d$new_date,d$iops,xaxt='n',xlab= "Dates",ylab="iops", type="l", col='red')
    axis.POSIXct(1,at=d$new_date,labels=format(d$date, format= "%m/%d/%Y %H:%M"),las=2)
    q()

    This is how the file looks like

    01/31/2013 22:26:05,36.953642
    01/31/2013 22:28:17,82.334787
    01/31/2013 22:30:28,89.057602
    01/31/2013 22:32:38,105.279861
    01/31/2013 22:34:38,69.626364
    01/31/2013 22:36:57,68.110564
    01/31/2013 22:39:24,122.304370
    01/31/2013 22:41:32,67.490331
    01/31/2013 22:43:56,107.949942
    01/31/2013 22:46:26,93.248857
    01/31/2013 22:48:32,70.976643
    01/31/2013 22:52:42,142.202259
    01/31/2013 22:54:56,84.722920
    01/31/2013 22:57:10,41.355000

    regards,
    A

    ReplyDelete
  7. How do you show the date labels instead of the month labels

    ReplyDelete
  8. Hi Shreyes,
    This was really very helpful. It worked for me as you shown your instructions above. I have a question:

    I have groups of record in my csv file. I want to create multiple graph in same window with different color for different groups. For Example, I have N1 record of group A at different timepoint and N2 for B at different time points. I want to have two graphs in same window with Red and green color. Is it possible? Is it possible to give the option “group by” in below example?

    plot(as.Date(Lines$TIMEPOINT), Lines$RESULT_VALUE, xlab= "Time In Week.", ylab= "Result Value AT Specific Week.", type='l', col='red')

    I saw there are different kinds of graph but this method was very simple. Thanks in advance for your help.

    Regards,
    Piyush

    ReplyDelete
  9. Hi, Could you please help plotting data like below ?

    00-00-0C-12-43-02 2/21/2015 0:00 GOOD
    00-00-0C-12-43-02 2/21/2015 1:00 GOOD
    00-00-0C-12-43-02 2/21/2015 2:00 CRASH
    00-00-0C-12-43-02 2/21/2015 3:00 CRASH
    00-00-0C-12-43-02 2/21/2015 4:00 ERROR
    00-00-0C-12-43-02 2/21/2015 5:00 WARN
    00-00-0C-12-43-02 2/21/2015 6:00 GOOD
    00-00-0C-12-43-02 2/21/2015 7:00 CRASH
    00-00-0C-12-43-02 2/21/2015 8:00 GOOD
    00-00-0C-12-43-02 2/21/2015 9:00 GOOD
    00-00-0C-12-43-02 2/21/2015 10:00 GOOD
    00-00-0C-12-43-02 2/21/2015 11:00 GOOD
    00-00-0C-12-43-02 2/21/2015 12:00 GOOD
    00-00-0C-12-43-02 2/21/2015 13:00 ERROR
    00-00-0C-12-43-02 2/21/2015 14:00 ERROR
    00-00-0C-12-43-02 2/21/2015 15:00 GOOD
    00-00-0C-12-43-02 2/21/2015 16:00 GOOD
    00-03-E0-43-11-19 2/21/2015 0:00 GOOD
    00-03-E0-43-11-19 2/21/2015 1:00 GOOD
    00-03-E0-43-11-19 2/21/2015 2:00 CRASH
    00-03-E0-43-11-19 2/21/2015 3:00 ERROR
    00-03-E0-43-11-19 2/21/2015 4:00 GOOD
    00-03-E0-43-11-19 2/21/2015 5:00 ERROR
    00-03-E0-43-11-19 2/21/2015 6:00 ERROR
    00-03-E0-43-11-19 2/21/2015 7:00 GOOD
    00-03-E0-43-11-19 2/21/2015 8:00 GOOD
    00-03-E0-43-11-19 2/21/2015 9:00 GOOD
    00-03-E0-43-11-19 2/21/2015 10:00 GOOD
    00-03-E0-43-11-19 2/21/2015 11:00 GOOD
    00-03-E0-43-11-19 2/21/2015 12:00 GOOD
    00-03-E0-43-11-19 2/21/2015 13:00 ERROR
    00-03-E0-43-11-19 2/21/2015 14:00 GOOD
    00-03-E0-43-11-19 2/21/2015 15:00 GOOD
    00-03-E0-43-11-19 2/21/2015 16:00 ERROR

    ReplyDelete
  10. i want to know count how many time nifty cross 8200 level i have csv file with me but i dont knw to put formula in R for that please anybody help me.
    thanks in advance

    ReplyDelete
  11. I want to plot a continuous time series for the data in this format
    Jan Feb...........Dec
    1990
    1991
    1992
    .
    .
    .
    .
    2010

    I want to have a script that will plot one year followed by the other i.e. starts from Jan-Dec and continued from the next year.

    ReplyDelete
  12. Hey,
    How to plot the log of one column versus another.

    ReplyDelete