Revision as of 07:33, 23 April 2012 by Rhea (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Getting Data Through CRSP

Retrived from course forum for ECE695: "Financial Engineering", Spring 2012


There are two possible ways in which CRSP database can be accessed through WRDS. One is through the web and the other is through their UNIX interface. In this page, we will see how UNIX interface can be used to gather information about the relevant securities.

To login,type the following at the terminal: ssh username@wrds.wharton.upenn.edu

Once we are here, we can use a lot of utility functions to get the job done relatively easily. A good place to look for relevant commands is in the document below: http://www.crsp.com/documentation/product/stkind/software/software_guide.pdf.

Say, you want to generate the list of all the securities that were part of S&P 500 from 2003/01/01 to 2009/12/31. This can be easily accomplished by making use of the dstkprint utility: wrds(~)% dstkprint /gp.16 /fs /sq /of dsp500.list /dt 20030101-20091231

In the above command, /gp.16 denotes the group number for S&P 500. If I am not wrong, as of today, CRSP only supports S&P 500 group. /of denotes the output file name, while /dt is used to specify the range of dates one is interested in looking into. The output(dsp500.list) of the above command will look something like this:

10078|16|19920820|20100128| 1| 0 10102|16|19570301|19750205| 1| 0 10104|16|19890803|20101231| 1| 0 10107|16|19940607|20101231| 1| 0 10108|16|20020722|20050811| 1| 0 10137|16|19251231|19760630| 1| 0

The important columns in the above file are 1, 3 and 4. Column 1 corresponds to the permno (unique identifier used by CRSP for each security), while column 3 refers to the date in which the security entered the S&P 500 and column 4, the data at which it ceased to be the member of S&P 500.e.g. the security with permno 10078 entered S&P 500 on August 20, 1992 and stayed till January 28, 2010.

With the access to permno of all the constituents of interest, it is again very easy to acquire the time series (ts) for each of the securities. This will be done using the ts_print utility. Say for example, you want to acquire the time series for IBM (permno 12490), it can be accomplished by simply typing: wrds(~)% ts_print 12490.txt

where the file 12490.txt looks something like this:

#Sample request file for price, total return, Ticker name 

ENTITY LIST|PERMNO 12490 END ITEM ITEMID prc ITEMID ret ITEMID shr ITEMID ticker END DATE CALNAME daily|RANGE 20080101-20101231 END OPTIONS X ITEM, YES| Y DATE, YES|Z ENTITY, YES, 1|OUTNAME 12490.dstk|NOFILL END

After executing the above command, a file 12490.dstk will be generated containing time series for daily price, return, shares outstanding and ticker for IBM from 2008/01/01 - 2010/12/31. More details about the various identifiers that can be used can be found in the above referenced .pdf file. Automating the above process for all the securities of interest is trivial by writing a very simple C/Python code.

So, in this page we looked at acquiring the permno for the constituents of S&P 500. This was followed by generating a separate file for each of the security which contained the time series of interest. So long as we have access to permno, acquiring data about that particular security is very simple and easy.


Thank you Mayur! I have two questions about S&P 500 in CRSP.

  1. Can we get a current constitutes information list for S&P 500? Then I can use it combining with dsp500.list to produce all constitutes information everyday.
  2. Which command can we use to download data from Wharton server to local?

Bingyue,

To bring a file over, type the following at a Unix/Linux prompt: ssp username@wrds.wharton.upenn.edu:filename . If the file resides in a subdirectory, then "filename" must be the complete path to the file from your root directory in WRDS. If you want to use a different file name in your local directory, replace the dot with that filename, e.g., ssp username@wrds.wharton.upenn.edu:filename filename1

By the way, to enable the use of X windows when you SSH to WRDS, you can use the following way of remotely logging in: ssh -X -l username wrds.wharton.upenn.edu

CRSP updates their daily data once a year. Right now, only data until the end of 2010 is available. My guess is that they probably do not have S&P 500 compositions for 2011.


Getting List of constituents of an Index

In the article on 'Getting Data through CRSP', we concentrated on just obtaining constituents of S&P 500. One of the ways to obtain the constituents of other indices is to follow the following steps:

1. Log on to WRDS web interface 2. Select the Compustat database from the left menu 3. Click on North America, followed by Index Constituents 4. Now fill in the dates and other attributes of interest to generate the file containing the details about all the constituents. Some of the Tickers of popular indices are given below: (Index,TIC): (S&P 500, i0003), (S&P100,i0014), (S&P 1500, i0020), (DJIA 30, i0005) 5. The final .csv file generated should be very similar to the dsp500.list described in the previous posting 6. One big difference would be that this new file will have GVKEY as a unique identifying element for a security. This is because we used Compustat database instead of CRSP database.

However this should not be a problem. This can be addressed in two ways: 1. Obtain the corresponding PERMNO for the GVKEY using various utility functions available for CRSP database. 2. Directly use the ts_print with GVKEY as the unique identifier. In this case, the sample request file (12490.txt in the previous post) will look like:

#Sample request file for price, total return, Ticker name 

ENTITY LIST| GVKEY 006066 END ITEM ITEMID prc ITEMID ret ITEMID shr ITEMID ticker END DATE CALNAME daily|RANGE 20080101-20101231 END OPTIONS X ITEM, YES| Y DATE, YES|Z ENTITY, YES, 1|OUTNAME 006066.dstk|NOFILL END

Note that the only change has been that the new entity has been defined by GVKEY instead of PERMNO. It would be a good idea however to be consistent with the unique identifier used throughout your code. Therefore, it might be better to make a choice early if you are planning to go along with GVKEY or PERMNO.


Professor, I have downloaded data from sever to local by scp command.

Thanks for your help! Mayur


Bingyue---according to Mayur's latest post, you can actually access historical compositions for S&P 1500 through Compustat. (Thanks Mayur!)


  • Write question/comment here.
    • answer here.

Back to ECE695: "Financial Engineering", Spring 2012

Alumni Liaison

Basic linear algebra uncovers and clarifies very important geometry and algebra.

Dr. Paul Garrett