Since you are all hoping to get interviews and get and keep jobs, try to treat these little problems like actual reports that you'd present to a supervisor, that is, it is good to think about your reader and the fact that your reports will be presented to intelligent people who may not be familiar with statistical lingo. You'd not want to answer question 2 with "yes, 0.173, no" in that a reader would then have to somehow figure out what question you were answering and what is the interpretation and practical impact, if any, of the answer "yes, 0.173, no". Also, your supervisor is busy and resents wordy rambling descriptions so too much and too little description are both mistakes.

  1. I have a SAS program that reads in data on Amazon.com stock prices near the time of Amazon's initial public offering. Run that program and take a look at the graphs. The variable DATE is a SAS date variable that you might want to work with a little, and SPREAD is log(high)-log(low) where these are the high and low prices for the day. Explain to your readers how this relates to the ratio (high/low), why this can't possibly ever be a negative number, and how this number relates to the maximum percentage increase that a day trader could have obtained had she executed a perfectly timed buy and sell strategy (ignoring any fees etc. - just thinking simply). This would be buying at exactly the low point and selling exactly at the high point and it assumes that this is possible (i.e. that the low occurs before the high) which would not necessarily happen. Along with your explanation, include the graph of SPREAD and the most recent 10 DATE and SPREAD values.

  2. Do you see any upward or downward trend in the SPREAD variable? What is the estimated daily increase or decrease in SPREAD indicated by a simple least squares regression of SPREAD on DATE? Is that (statistically) significantly different from 0?

  3. Are there any assumptions in simple linear regression that might be violated by this data? If so list them. Note (I am telling you) that certain violations of assumptions will render the least squares standard errors invalid, thus also invalidating the calculated t tests and their p-values.

  4. Use, as we did in class, PROC ARIMA and the IDENTIFY statement to print out the ACF, IACF, and PACF. Suppose I told you that I thought that SPREAD was just a nonzero mean plus white noise errors. Comment on how you would respond.

  5. Again using PROC ARIMA, and the p= and q= options for ARMA(p,q) models as we did in class, fit an AR or an MA or an ARMA model (no trend) , whatever you think is best. This is real data so I do not know what the "right" answer is, in fact there is no right answer (though there would clearly be some unreasonable answers). Then
    1. Write it out for me in the form (Yt - ____) = ___(Yt-1 -____)+...+(Yt-p -____) + et - ___et-1 -...-___et-q
    2. Give me some evidence that the errors from your model are white noise (i.e. you've squeezed all available information for forecasting from the data).
    3. Give me some evidence that you haven't gone too far in the sense of including terms you don't really need in your model.
    4. Estimate the average percentage gain for the imagined perfectly timed trade and give its standard error.

  6. Fit the same model in the time series forecasting system. Once you've done this, just tell me here that you did so and I will trust that you did. Under develop models you'll want to select fit and ARIMA model and give it your p and q.