As you know, energy is currently a big economic issue in the US and elsewhere.
Natural gas is one form of energy that is tracked by the US Department of Energy
and available from their web site. Natural gas is stored underground and the amount
stored is recorded weekly.
I have compiled this SAS program to read in the data and plot it.
- Run the program.
- By experimentation or otherwise, explain why I used the overlay
option and the w=3 option, that is, what do these do?
- Also look in the log. Record the
interesting information given there for the i=rl interpolation method.
- Optionally, investigate
some other smoothing numbers.
- Export the graph and include it in your report.
- Compute a variable that has the week of the year (1 through 52 or 53*) for each year.
List the first 80 dates, gas storage amounts, and week numbers. This is the second
part of your report. Start numbering the weeks on the first observation
from each year.
*In this way, there will be 2 years that have 53 weeks. Which years are those?
- Add an extra 52 dates to the data set with missing values for Gas. Include
the week variable here as well. Now run a regression with a linear trend and seasonal dummy variables for weeks as we did in the demo program Air_code.sas.
- What are the intercept and slope for your linear trend model?
- Is the trend significant?
- How much would you shift that linear trend if you were in the 3rd week of a year?
- List the first 10 forecasts into the future (past the end of the given data) with upper and lower prediction intervals.
- Does the trend plus dummy variable model above have the same shift for week 3 of every year? In light of the graphs you've seen, do you think that is reasonable? Explain.
- We have also talked about a first and sesaonal span difference. Suppose you think that Y(t)-Y(t-1) -(Y(t-52)- Y(t-53)) = e(t), that is, the first and span 52 difference
combination gives you a white noise sequence. In that model our best prediction of Y(t) would be Y(t-1) + Y(t-52)-Y(t-53), that is, it would be last week's number plus
the corresponding week to week change we saw last year.
- Compute this prediction for the 07 Feb. 1997 and compare it to what we actually got that week.
- Go into the TSFS time series forecasting system. As you know the data are now in your work directory. Get the graph for the sequence of Y(t)-Y(t-1) -(Y(t-52)- Y(t-53))
and export it into your report.
- Using your mouse, check the value for 07 Feb 1997 and compare against the deviation from prediction that you got above.
- Saving the graph is not trivial so here is a clue saving the graph
- We'll now try trigonometric functions which are among the few well known functions that are periodic.
- An especially useful alternative to dummy variables, especially with a long periodicity like 52, is a set of trigonometric functions. Angles are usually measured in radians (a radian is the angle between two lines from the circle center extending to the two ends of a string whose length is the circle radius, and that is laid out around the edge of the circle). A period 52 cycle has "frequency" (2 pi/52) so, as t goes from 1 to 52, the angle (2 pi t/52) goes through 2 pi radians, the number of radians in a full circle. Because
Asin(wt+d)=B1cos(wt)+B2sin(wt)
the inclusion of both a sine and cosine at each frequency w of interest will allow the amplitude A and "phase shift" d to adjust to the data. Both should always be included. For those who are more mathematical, you might recall that B1=Asin(d) and B2=Acos(d) and the formula is that for the sine of the sum of two angles.
-
After glancing through the background information above, add variables S1 C1 S2 and C2 to your data as follows:
PI=CONSTANT('PI');
S1=SIN(2*PI*DATE/365.25); C1=COS(2*PI*DATE/365.25);
S2=SIN(4*PI*DATE/365.25);C2=COS(4*PI*DATE/365.25);
where the second trigonometric pair goes through 2 cycles per year, affects the shape of the yearly cycle, and is called a "harmonic" of the first "fundamental frequency". All of these variables have the same value at the same time of year each year. With enough of these you can model any periodic function.
-
Now replace the class variable WEEK by this set of 4 regressors, thus reducing 51 degrees of freedom to 4. Plot the residuals from this model and compare error sums of squares and error mean square of this model to those of the model with the dummy variables. You’ll see the errors do not look independent over time (why?) and we’ll be able to do better with tools we have yet to discuss