| University of Washington | ||
| JSM 2010 Talk Challenge Data: data set contains n=7257 observations (y, x1,...,x5003) | UNC | Rutgers R |
| Yale Handsome Dan | Virginia Commonwealth U Number 1 | Texas A&M Logo. |
| Iowa State Cyclone. Adapted from an image on Iowa State Athletic Site . | Mizzou Tiger. Adapted from an image on Mizzou Athletic Site . | Bob Dylan. Adapted from the rendering by FiSe and Max . |
Click on the image for data sets and a SAS program.
You've found your way to the web page advertised in The American Statistician article, ''Residual Sur(Realism)'' by Leonard A. Stefanski, (2007, TAS, Vol. 61, pp 163-177). Here you'll find data sets with hidden images and messages embedded in them that are revealed only when the data are subject to an appropriate statistical analysis. If you haven't read the paper, then you might want to peruse the online version of the article. But be warned --- the paper contains a lot of images and takes a while to load.
You can get a good idea of what's in the paper by looking at my presentation slides from the Southern Regional Council on Statistics 2007 Summer Research Conference, June 4-6, 2007. In addition, the slides describe a different algorithm for generating image-embedded data sets that is simpler than the original algorithm in the TAS paper.
Chances are that you're interested in either downloading
data sets or downloading
computer code
for
generating data sets. I now have a
reasonably user-friendly, data set construction program that allows you to embed
your own images and messages in data sets, and control several
characteristics of the constructed data sets. Some contributed
R code
is also available
From the interesting links department ...
What do regression, variable selection, Homer Simpson, and
Stone Cold Steve Austin have in common? Well,
see for yourself (scroll down a little more than halfway).
Here's
another link that I found amusing (scroll down to the bottom).
When you click on an image (residual plot) you will be taken to a directory that contains several data sets all of which have the same residual plot image. The data sets differ in the number of predictor variables, the model coefficient of variation, whether the model contains quadratic terms, and the "shape" of the regression coefficients.
You should learn more about the format of the data sets before looking into the directories.
In addition to the data sets, each directory contains two SAS programs. One program will read in all versions of the data sets, fit the correct regression model for each particular data set, and then construct the residual plot. The second SAS program will read in all versions of the data sets, fit the correct regression model for each particular data set, and then construct the residual plot and also all of the plots of Y vs Xj and Xi vs Xj. Because this site is still under construction, it may be that not all SAS programs or data sets are available for all images. If you find something that isn't available or working as advertised, please drop me a note stefansk@stat.ncsu.edu . Also, if you have an interesting image or message that you'd like to see embedded in a data set, let me know. I can't promise that I'll create the data sets, but if it has general interest I'll try to accommodate the request.
The data sets stored on this web site do not contain any so-called phony or uninformative variables. If you want to append uninformative (phony) variables to the set of informative predictor variables (e.g., for a variable selection exercise), then take a look at this SAS program in a text viewer; or open the SAS program in SAS. The program will read in an embedded-image data set (Y,X1,...,Xp) and then add a user-specified number of phony predictors to the data set, creating an output data set (Y,W1,...,Wp*) with the real predictor Xj appearing in a user-specified column Wj*.
| Correct Model. Is there ever a truly correct model in real statistical applications? If I came across this residual plot, I'd become a believer. | X Marks the Spot. Whatever you're looking for in life it's here, despite what Indiana Jones says. | Go Wolfpack. |
| Go Duke. |
Homer Simpson. Mathematics is no stranger to the Simpsons, see simpsonsmath.com,
but this may be the first time that Homer has dabbled in statistics. It appears that he
has embedded himself into an infinite do loop ...
The image above is a spoof of a scene from a Simpson's episode in which Homer tackles a difficult mathematical derivation. |
Buffalo. If you've ever seen one up close, then you know that bison are truly majestic animals. Yellowstone bison are subject to many unnatural threats including ranching interests outside the park and by vehicular traffic in the park. If you're interested you can help save the Yellowstone bison. |
| Bullseye. | NSF Acknowledgement. No less surprising than these residual plots is the fact that the National Science Foundation contributed to the funding of this research! | Ronald Aylmer Fisher (1890-1962). This is an embedded-image version of a well-known image of Fisher operating a calculating machine. You can find the original at several places on the web, see here for example. |
| Normal. | Carl Friedrich Gauss. | NCSU VIGRE. |
| Gertrude Cox. | Bayes' Formula. | Thomas Bayes. |
| Three Trout. | Lone Trout. | Lies, Damn Lies and Statistics. Who really first uttered these words? No one knows, but you can read speculations about the quote's origins at Wikipedia. |
| Bikinis. I first read this quotation back in 1979 in a motorcycle magazine. I cut it out and kept it in my desk for years, and then lost it during an office move. I was pleasantly surprised when I came across it again on Quote Garden. | George Box Quote. What is a real doozy? | ASA Logo. |