University of Michigan Statistics | ||
Google New York | Michigan State University Statistics | Kansas State University Statistics |
Helen Barton Lecture Series | ISS 2012 ST. JOHN'S | University of Washington |
JSM 2010 Talk Challenge Data: n=7257, p=5003 | UNC | Rutgers R |
Yale Handsome Dan | Virginia Commonwealth U Number 1 | Texas A&M Logo. |
Iowa State Cyclone. Adapted from an image on Iowa State Athletic Site . | Mizzou Tiger. Adapted from an image on Mizzou Athletic Site . | Bob Dylan. Adapted from the rendering by FiSe and Max . |
Click on the image for data sets and a SAS program.
You've found your way to the web page advertised in The American Statistician article, ''Residual Sur(Realism)'' by Leonard A. Stefanski, (2007, TAS, Vol. 61, pp 163-177). Here you'll find data sets with hidden images and messages embedded in them that are revealed only when the data are subject to an appropriate statistical analysis. If you haven't read the paper, then you might want to peruse the online version of the article. But be warned --- the paper contains a lot of images and takes a while to load.
You can get a good idea of what's in the paper by looking at my presentation slides from the Southern Regional Council on Statistics 2007 Summer Research Conference, June 4-6, 2007. In addition, the slides describe a different algorithm for generating image-embedded data sets that is simpler than the original algorithm in the TAS paper.
Chances are that you're interested in either downloading data sets or downloading computer code for generating data sets. I now have a reasonably user-friendly, data set construction program that allows you to embed your own images and messages in data sets, and control several characteristics of the constructed data sets. Some contributed R code is also available
From the interesting links department ...
What do regression, variable selection, Homer Simpson, and
Stone Cold Steve Austin have in common? Well,
see for yourself (scroll down a little more than halfway).
Here's
another link that I found amusing (scroll down to the bottom).
When you click on an image (residual plot) you will be taken to a directory that contains several data sets all of which have the same residual plot image. The data sets differ in the number of predictor variables, the model coefficient of variation, whether the model contains quadratic terms, and the "shape" of the regression coefficients.
You should learn more about the format of the data sets before looking into the directories.
In addition to the data sets, each directory contains two SAS programs. One program will read in all versions of the data sets, fit the correct regression model for each particular data set, and then construct the residual plot. The second SAS program will read in all versions of the data sets, fit the correct regression model for each particular data set, and then construct the residual plot and also all of the plots of Y vs Xj and Xi vs Xj. Because this site is still under construction, it may be that not all SAS programs or data sets are available for all images. If you find something that isn't available or working as advertised, please drop me a note stefansk@stat.ncsu.edu . Also, if you have an interesting image or message that you'd like to see embedded in a data set, let me know. I can't promise that I'll create the data sets, but if it has general interest I'll try to accommodate the request.
The data sets stored on this web site do not contain any so-called phony or uninformative variables. If you want to append uninformative (phony) variables to the set of informative predictor variables (e.g., for a variable selection exercise), then take a look at this SAS program in a text viewer; or open the SAS program in SAS. The program will read in an embedded-image data set (Y,X1,...,Xp) and then add a user-specified number of phony predictors to the data set, creating an output data set (Y,W1,...,Wp*) with the real predictor Xj appearing in a user-specified column Wj*.
Correct Model. Is there ever a truly correct model in real statistical applications? If I came across this residual plot, I'd become a believer. | X Marks the Spot. Whatever you're looking for in life it's here, despite what Indiana Jones says. | Go Wolfpack. |
Go Duke. |
Homer Simpson. Mathematics is no stranger to the Simpsons, see simpsonsmath.com,
but this may be the first time that Homer has dabbled in statistics. It appears that he
has embedded himself into an infinite do loop ...
The image above is a spoof of a scene from a Simpson's episode in which Homer tackles a difficult mathematical derivation. |
Buffalo. If you've ever seen one up close, then you know that bison are truly majestic animals. Yellowstone bison are subject to many unnatural threats including ranching interests outside the park and by vehicular traffic in the park. If you're interested you can help save the Yellowstone bison. |
Bullseye. | NSF Acknowledgement. No less surprising than these residual plots is the fact that the National Science Foundation contributed to the funding of this research! | Ronald Aylmer Fisher (1890-1962). This is an embedded-image version of a well-known image of Fisher operating a calculating machine. You can find the original at several places on the web, see here for example. |
Normal. | Carl Friedrich Gauss. | NCSU VIGRE. |
Gertrude Cox. | Bayes' Formula. | Thomas Bayes. |
Three Trout. | Lone Trout. | Lies, Damn Lies and Statistics. Who really first uttered these words? No one knows, but you can read speculations about the quote's origins at Wikipedia. |
Bikinis. I first read this quotation back in 1979 in a motorcycle magazine. I cut it out and kept it in my desk for years, and then lost it during an office move. I was pleasantly surprised when I came across it again on Quote Garden. | George Box Quote. What is a real doozy? | ASA Logo. |
Currently I can offer a few options for creating your data sets.
For all ...
The easiest (easy if you are comfortable editing .txt files and running Windows batch files by clicking on them) and most complete package is a compiled version of a GAUSS program that I wrote. You don't need to know anything about GAUSS to use the compiled program. However, you have to download the file All_Files_ro_May262007.zip and unzip them to a common directory. Then follow the directions in the READ_ME file. This program allows you to embed your own messages and images (several image file types are supported) in data sets having certain controllable characteristics. All of the inputs to the program are modified by editing two .txt files. Then you point and click on the appropriate batch file. The data sets are then written to a program-created subdirectory in .txt files. The program also writes to the subdirectory a SAS program that will read the data files, fit regression models, and display the residual plots. R users can just read the data directly from the .txt files. However, the .txt data files created by my program are "wrapped" if the number of independent variables is large, and I've been told that reading wrapped files in R is problematic. Don't despair, an R user has contributed the necessary code (see the R code section below for a description) that you can download and use to read the wrapped text files.
For R users ...
Some folks have contributed R code that can be used to construct data sets. I haven't used these programs so I don't know their full capabilities. However, the authors have given their consent to making the code available on this web page. So you're free to download the code and use or modify as you see fit.
R code contributed by John Staudenmayer.
R code contributed by Peter Wolf.
R code for reading wrapped data files contributed by Ulrike Gromping.
For GAUSS programmers ...
If you are familiar with GAUSS and want to run or modify the source code for the program described in the previous paragraph, you can download the GAUSS code here. If you make any substantial improvements, let me know and I'll post them on this web page if you'd like. It's possible (in fact likely) that I've overlooked a few of the user-defined GAUSS procedures called by the program. So if you try running it and get an error message about GAUSS not finding this or that file, let me know and upload the missing file within a day or two.
The material on this web page is based upon work supported by the National Science Foundation under Grant No. 0504283. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation. |
Legal Notice: The Simpsons, TM and copyright, Fox and its related companies. All rights reserved. Any reproduction, duplication, or distribution in any form is expressly prohibited. | ||
Disclaimer: This web site, its operators, and any content contained on this site relating to The Simpsons are not authorized by Fox. |