Google data


Google has a site in which you can get information about how many hits, week by week, are encountered from any search string you specify. I searched for "loans". The data are normalized to lie between 0 and 100 and they are relative to that week's total hits so the magnitudes themselves do not have meaning. The SAS program below inputs and reshapes the data to standard time series format. If you were working for a bank, being able to assess what time of year people were particualry intereted in loans would help you with planning and possibly publicity.
  1. Plot the data noting any unusual fetatures in your report.
  2. Extend the data out another 52 weeks, setting Y to missing and incrementing the dates appropriately. Use this data from now on. Create some sine and cosine variables. For example:
    theta = 2*3.14159/365.25;
    S1=sin(theta*date); C1=cos(theta*date); 
    S2=sin(2*theta*date); C2=cos(2*theta*date); 
    
    will create a sine - cosine pair that goes through one cycle per year and another that goes through 2 cycles per year. Plot the S1 and C1 varibles by date, overlaid on the same plot.
  3. Fit a model with a trend and as many of these sine-cosine pairs as you think necessary to get a reasonable fit. Use your judgement. Graph the data and fitted model on the same plot and make any comments you think are appropriate. For the future forecasts, inlcude 95% upper and lower confidence limits.
  4. By now you should have noticed a few very unusual points. One way to deal with these is to create a dummy variable for each of them. For example X1=(date="01aug2004"d); will be 1 on that date and 0 everywhere else. You can put this into your model to effectively dedicate a parameter to that point, hence completely removing its influence from the rest of the model. Add these to your model and redo (3) with the dummy variables included now. Use your judgement on how many of htese to include.
  5. Another way to handle this would be to set those suspicious values to missing and use PROC EXPAND to replace those missing values. Try this approach and report your findings.
  6. Take any of the models you like and output the residuals. Now go to the FETS forecasting system (the GUI we are studying) and compute the autocorrelations of the residuals. Report the autocorrelations at lags 1 2 and 3.
    
    /*--------------------------------------\
    |  Data from Google Insights. Y=number  |
    | of Google searches involving the word |
    | "loan". Counts are nomralized to a    |
    \ 0 to 100 scale.                      */
    Data A; 
    Input Y1 Y2 Y3 Y4; 
    cards; 
    62	58	59	57
    63	59	61	59
    61	57	55	61
    63	59	55	54
    63	57	58	54
    59	53	54	53
    56	50	54	54
    60	53	54	54
    60	55	57	55
    61	55	55	56
    61	55	54	58
    64	58	57	55
    66	59	57	58
    65	58	57	61
    64	60	60	55
    62	60	59	59
    63	58	57	58
    65	59	57	59
    64	60	60	58
    66	64	63	59
    66	60	63	59
    71	62	68	63
    72	65	72	63
    73	71	75	65
    73	75	82	67
    72	96	100	68
    81	68	68	68
    79	72	71	71
    82	71	73	71
    80	73	72	68
    82	73	71	72
    82	73	70	68
    76	70	70	71
    68	68	66	68
    61	63	60	63
    63	56	56	58
    59	57	56	58
    58	54	54	57
    58	55	53	56
    55	56	52	57
    56	53	54	52
    55	56	50	54
    56	53	51	51
    54	54	51	52
    55	53	51	50
    48	56	51	54
    58	47	43	47
    55	55	49	54
    55	53	46	49
    54	54	46	50
    53	52	45	44
    63	55	52	55
     .	64	 .	 .
     ;
    proc print; 
    Data B1; set A; Y=Y4; 
    Data B2; set A; Y=Y3; 
    Data B3; set A; Y=Y2; 
    Data B4; set A; Y=Y1;
    data Loans; set B1 B2 B3 B4;  
    Label Y="Hits (loan)"; if Y>.;
    Data Loans; set Loans; 
    date = intnx('week','07JAN2004'd,_n_-1); 
    format date date7.;  keep Y date; 
    proc print data=Loans; run; 
    proc gplot; plot Y*date; 
    symbol1 v=none i=join; 
    run;