New Residual Plots and Data Sets (Click on a plot to access plot files)

 University of Michigan Statistics Google New York Michigan State University Statistics Kansas State University Statistics Helen Barton Lecture Series ISS 2012 ST. JOHN'S University of Washington JSM 2010 Talk Challenge Data: n=7257, p=5003 UNC Rutgers R Yale Handsome Dan Virginia Commonwealth U Number 1 Texas A&M Logo. Iowa State Cyclone. Adapted from an image on Iowa State Athletic Site . Mizzou Tiger. Adapted from an image on Mizzou Athletic Site . Bob Dylan. Adapted from the rendering by FiSe and Max .

Click on the image for data sets and a SAS program.

Under Development (but still useful)

You've found your way to the web page advertised in The American Statistician article, ''Residual Sur(Realism)'' by Leonard A. Stefanski, (2007, TAS, Vol. 61, pp 163-177). Here you'll find data sets with hidden images and messages embedded in them that are revealed only when the data are subject to an appropriate statistical analysis. If you haven't read the paper, then you might want to peruse the online version of the article. But be warned --- the paper contains a lot of images and takes a while to load.

You can get a good idea of what's in the paper by looking at my presentation slides from the Southern Regional Council on Statistics 2007 Summer Research Conference, June 4-6, 2007. In addition, the slides describe a different algorithm for generating image-embedded data sets that is simpler than the original algorithm in the TAS paper.

Chances are that you're interested in either downloading data sets or downloading computer code for generating data sets. I now have a reasonably user-friendly, data set construction program that allows you to embed your own images and messages in data sets, and control several characteristics of the constructed data sets. Some contributed R code is also available

From the interesting links department ...

What do regression, variable selection, Homer Simpson, and Stone Cold Steve Austin have in common? Well, see for yourself (scroll down a little more than halfway). Here's another link that I found amusing (scroll down to the bottom).

Embedded-Image Data Sets

Data Sets with Images in Residual Plots

Data Sets with Images from the TAS Article

When you click on an image (residual plot) you will be taken to a directory that contains several data sets all of which have the same residual plot image. The data sets differ in the number of predictor variables, the model coefficient of variation, whether the model contains quadratic terms, and the "shape" of the regression coefficients.

In addition to the data sets, each directory contains two SAS programs. One program will read in all versions of the data sets, fit the correct regression model for each particular data set, and then construct the residual plot. The second SAS program will read in all versions of the data sets, fit the correct regression model for each particular data set, and then construct the residual plot and also all of the plots of Y vs Xj and Xi vs Xj. Because this site is still under construction, it may be that not all SAS programs or data sets are available for all images. If you find something that isn't available or working as advertised, please drop me a note stefansk@stat.ncsu.edu . Also, if you have an interesting image or message that you'd like to see embedded in a data set, let me know. I can't promise that I'll create the data sets, but if it has general interest I'll try to accommodate the request.

The data sets stored on this web site do not contain any so-called phony or uninformative variables. If you want to append uninformative (phony) variables to the set of informative predictor variables (e.g., for a variable selection exercise), then take a look at this SAS program in a text viewer; or open the SAS program in SAS. The program will read in an embedded-image data set (Y,X1,...,Xp) and then add a user-specified number of phony predictors to the data set, creating an output data set (Y,W1,...,Wp*) with the real predictor Xj appearing in a user-specified column Wj*.

 Correct Model. Is there ever a truly correct model in real statistical applications? If I came across this residual plot, I'd become a believer. X Marks the Spot. Whatever you're looking for in life it's here, despite what Indiana Jones says. Go Wolfpack. Go Duke. Homer Simpson. Mathematics is no stranger to the Simpsons, see simpsonsmath.com, but this may be the first time that Homer has dabbled in statistics. It appears that he has embedded himself into an infinite do loop ... The image above is a spoof of a scene from a Simpson's episode in which Homer tackles a difficult mathematical derivation. Buffalo. If you've ever seen one up close, then you know that bison are truly majestic animals. Yellowstone bison are subject to many unnatural threats including ranching interests outside the park and by vehicular traffic in the park. If you're interested you can help save the Yellowstone bison. Bullseye. NSF Acknowledgement. No less surprising than these residual plots is the fact that the National Science Foundation contributed to the funding of this research! Ronald Aylmer Fisher (1890-1962). This is an embedded-image version of a well-known image of Fisher operating a calculating machine. You can find the original at several places on the web, see here for example. Normal. Carl Friedrich Gauss. NCSU VIGRE. Gertrude Cox. Bayes' Formula. Thomas Bayes. Three Trout. Lone Trout. Lies, Damn Lies and Statistics. Who really first uttered these words? No one knows, but you can read speculations about the quote's origins at Wikipedia. Bikinis. I first read this quotation back in 1979 in a motorcycle magazine. I cut it out and kept it in my desk for years, and then lost it during an office move. I was pleasantly surprised when I came across it again on Quote Garden. George Box Quote. What is a real doozy? ASA Logo.

Data Sets with Images in Added-Variable Plots

Data Sets with Images from the TAS Article

Forthcoming (maybe) ...

Data Sets with Other Embedded Images

Forthcoming (maybe) ...

Computer Code

Currently I can offer a few options for creating your data sets.

For all ...

For R users ...

Some folks have contributed R code that can be used to construct data sets. I haven't used these programs so I don't know their full capabilities. However, the authors have given their consent to making the code available on this web page. So you're free to download the code and use or modify as you see fit.

R code contributed by John Staudenmayer.

R code contributed by Peter Wolf.

R code for reading wrapped data files contributed by Ulrike Gromping.

For GAUSS programmers ...

If you are familiar with GAUSS and want to run or modify the source code for the program described in the previous paragraph, you can download the GAUSS code here. If you make any substantial improvements, let me know and I'll post them on this web page if you'd like. It's possible (in fact likely) that I've overlooked a few of the user-defined GAUSS procedures called by the program. So if you try running it and get an error message about GAUSS not finding this or that file, let me know and upload the missing file within a day or two.

 The material on this web page is based upon work supported by the National Science Foundation under Grant No. 0504283. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

 Legal Notice: The Simpsons, TM and copyright, Fox and its related companies. All rights reserved. Any reproduction, duplication, or distribution in any form is expressly prohibited. Disclaimer: This web site, its operators, and any content contained on this site relating to The Simpsons are not authorized by Fox.