Two important hints as you get started:
At the UNIX prompt (we will assume that it is just a %) type in
% S
It takes a few seconds for the S program to be loaded. Once it is set up, S will print something like :
S-PLUS : Copyright (c) 1988, 1999 MathSoft, Inc. S : Copyright Lucent Technologies, Inc. Version 5.1 Release 1 for Sun SPARC, SunOS 5.5 : 1999 Working data will be in .Data >To quit S type
Once S is started, it will give you a prompt that is a greater than sign (). You will need to set up the S session so that the special class data sets and functions are available. This involves typing two commands. (At this point, don't worry about what these mean.)
attach.slab()
setup.slab()
After the second command, S will try to open up a plot window, give some tips about using S, and check how much file space you have used up. Once everything is done, you will get the class prompt Slab > to remind you that you have access to the S-Lab functions and data sets. However, we will only use as the prompt indicator in these notes (this is standard in most S documentation).
Throughout this guide it will be assumed that you know to press the return (or enter) key after you have typed in the full command. If you are waiting for S to calculate or to graph something, it is often a good idea to hit the return key just to make sure it is not waiting for this return keystroke. Usually typing an extra return will not cause any problems, and it is a good way to be sure S is still listening to you. If you hit a return by mistake or try to type beyond the end of a line, S usually knows that you are not finished. It will ask for more by giving you a plus sign (+) as the prompt.
If at any stage you want S to ignore what you have typed, hold down the control key and press the C key ( i.e. type ``control C''). This will halt anything unpleasant that might be happening and bring you back to the normal Slab prompt.
S commands are always followed by open and closed parentheses:
command(thing1, thing2, ...)
Sometimes the command needs some numbers or the name of a data set from you, and that information is given inside the parentheses. For example,
sqrt(2) We typed this and hit return.Note that in the above example, we did not save the result of taking the square root, and so it was just printed to the screen. To save the result of a command into a data set, use the operator -> ( a minus sign followed by a greater than sign with no spaces in between). So if you are issuing commands and saving the results to data sets, the general format looks like:[1] 1.414214 The computer returns the result.
command(thing1, thing2, ...) -> dataset
If you don't mind thinking backwards, you can type the output data set first, the operator <- second, and then the S command. S aficionados tend to do it this way. For example:
sqrt(x) -> data4 Puts the square root of x in data4.data4 <- sqrt(x) Exactly the same as the previous command.
The S language is object-oriented. This means that the results of applying a command to a data set may depend on the type of data. You should keep this in mind if you ever get strange results from using a familiar command. This is especially true of the plot command as it will make different types of graphs depending on what you give it.
Everything that is not a command is a data set. A data set can have any type of name, but be sure that it does not start with a numeral or contain an underscore ( _ ). For example, 3dat and dat_3 are not allowed, but dat.3, dat.four.3, and dat.3.four are fine.
If you ever want to list a data set on the screen just type its name. For example, suppose that the numbers 10,2,3,5,8 have been put in a data set called test1. To list test1 just type it:
test1
[1] 10 2 3 5 8It is helpful to think of test1 as a column vector with five rows (but to save space S prints it in a row). To refer to the fourth member of this vector we just use subscripts:
test1[4]
[1] 5Suppose that we also have a second data set test2
test2
[1] 9 4 6 7 12Here is a way to combine test1 and test2 into a matrix:
cbind(test1,test2)->test3
test3
test1 test2 [1,] 10 9 [2,] 2 4 [3,] 3 6 [4,] 5 7 [5,] 8 12cbind stands for ``column bind'' and creates a matrix data set. You can refer to elements in test3 using two subscripts. For example, the number in the fourth row and second column (7) is test3[4,2]. You can refer to the fourth row by typing test3[4,].
Data sets that have several pieces are called lists, and many of the data sets that are used in the lab will be in this form. For example, climate is a class data set that has various climate information for the 50 largest US cities. The individual components in the list are specified by a dollar sign ($) followed by the name of the component. For example climate$rain refers to the 50 precipitation values for the cities. climate$elev refers to the 50 elevations for the cities. Note that the component climate$city contains the cities' names and thus is not a numerical data set. You can refer to the third city by typing
climate$city[3]
[1] "austin"The climate data set is actually a special type of list called a data frame which has characteristics of a list but is like a matrix as well. More will be said about data.frames in a later section.
Suppose that for some sports team you have the heights and weights of the players in two data sets: height and weight. These two data sets may be just sitting in the class directory or you might have actually typed them in using the read.data() function. Let's actually do this.
read.data() -> height We typed this and hit return. 1: 72 The computer returned 1: and we typed 72 and return. 2: 69 The computer returned 2: and we typed 69 and return. 3: 74 The computer returned 3: and we typed 74 and return. 4: 66 The computer returned 4: and we typed 66 and return. 5: The computer returned 5: and we hit return.Similarly we can create weight:
read.data() -> weight
1: 185 2: 176 3: 192 4: 146 5:Another way to create the same file is with the combine function c():
c(185,176,192,146) -> weight
Data can also be read in from a file. The only difference is that the UNIX file name needs to specified. For example if the weights were in a file called w.data, ten use read.data('w.data')-> weight. Note the use of the single left quote marks around the file name. If you are familiar with a text editor, reading from a file is often an easier way to create larger data sets.
data.frame(weight,height) -> team
To view this new data set just type its name:
team
weight height 1 185 72 2 176 69 3 192 74 4 146 66Many of the data sets used in these labs will be data frames. and the first and second labs will give you some practice in using this format. Most of the time a data frame in S acts like a matrix. The main difference is that a data frame can have columns of character (text) information. For example, list out the data set climate$city, and you will see that city is a text variable.
One useful function is names. It will just tell you the names of the columns without listing out the whole data set. For example,
names(climate)
[1] "lat" "jan" "rain" "city" "jul" "elev" "lon"You may also read in data from a file and create a data frame all in the same step. Suppose that in the directory from which you started S you have a file called p.data
185 72 a
176 69 a
192 74 b
146 66 b
where ``a'' and ``b'' might refer to two different groups of people. Then
read.table("p.data",col.names=c("weight","height","group")) -> team2
will create a data frame with two columns of numbers and one column of characters:
team2
weight height group 1 185 72 a 2 176 69 a 3 192 74 b 4 146 66 b
For the team data set a table format was appropriate because the rows of the table would also make sense. There are many examples of data sets where the information does not fit together nicely as a table. A list is a data set type that can handle arbitrary collections of data. The data set drill.bit.list is a simple example of the results for two independent experiments. Five drill bits of one brand were tested and seven of another.
drill.bit.list
$besly: [1] 346 375 442 249 280 428 $cleveland: [1] 63 124 262 92 192 122 134 128Assume that the data values for the Besley and Cleveland bits were read in separately and stored in the data sets, say b.dat and c.dat. Here is how to create the drill.bit list:
list() -> drill.bit.list
b.dat -> drill.bit.list$besley
c.dat -> drill.bit.list$cleveland
You are not limited to a certain number of components for a list and can add more components as you work. Although lists are useful for holding data sets, they are also important for organizing the results of a statistical analysis or a complicated plot, and with a little practice they are very easy to work with. Lists are an excellent way to organize your work on a specific homework problem under a single name.
Many commands in S do not need any information or data sets to work. For example attach.slab or q (quit) do not require an argument. Other commands such as sqrt would not make sense without specifying a data set or a number. Finally there are other commands that will take different amounts of information depending on what is needed. For example the plot function can take two data sets say x and y and produce a scatterplot of these values (plot(x,y)). Specifying only one dataset, plot(zork) will result in the data values being plotted against equally spaced x values. Clearly the plot function has been designed to make some default choices based on what it is given. One merit of S is that the default choices are usually ones a human might want. Also, it is easy to override the default choices when they are inappropriate. For example the plot function defaults to using the data set names for the X and Y axes labels. To change these one just needs to know the names of these two parameters (use the help or args commands) and then supply your choices. The name of the label used for the Y axis in the plot function is ylab so
plot(year,zork,ylab='Number of bats found in Cox hall')
will indicate a different label for the X axis.
Another way of specifying additional arguments to a command is just by the order that S is expecting them. In this style, omitted arguments are just skipped over using comma's. See the remarks about the seq command in the following section on generating data for an example of this syntax. This second method based on order can save typing but is harder for beginners. However, there are some exceptions for common arguments. It is easy to remember that the plot functions first two arguments need to be the x and y data sets. They could be given out of order by referring to their names: plot(y=zork, x=time) although this seems silly compared to just plot(time, zork)!
In the current version 5.1 of S the help command brings up a Netscape window that is very slow. At the moment we don't recommend its use. The best way to access it is to type help.start().
An alternative to help is the function ex (ex for example) which gives examples associated with a command rather than the full help output. Also ex(plot) will print directly to the screen rather than on a help window.
Given below are some basic S commands, a brief explanation, and some examples of their use. As part of the labs, you will learn more about S, but these commands represent a core of what you will need to know. With these basic tools one can do an amazing amount of graphics and data analysis.
72 185 69 176 74 192 66 146then zork is the vector
72,185,69,176,74,192,66,146To read in data that are characters set the default option text to ``true.''
Subtract 4.2 from the numbers in test and over-write this data set with the new results:
test-4.2 -> test
Square the numbers in the data set test and save the results to the new data set test.squared (note the use of a period to make the name readable):
test**2 -> test.squared
For example, get the mean and standard deviation of the dataset test:
stats(test)
For example, make a stem and leaf plot of the simulated data in random.numbers.
stem(random.numbers)
lplot(d1,d2,d3) plots d1 versus d2 using the values in d3 as labels
For example, plot the sine function at several points:
c(0,.1,.2,.3,.4,.5,.6) -> x
sin(x) -> y
plot(x,y)
Then add a cosine curve to the plot
cos(x) -> y2
lines(x,y2)
Another way to add the cosine is to combine the two steps
lines(x,cos(x))
like plot but adds points to the current plot
For example, generate a grid of equally spaced points in the range -1 to 2 with a spacing of .01:
seq(-1,2,.01) -> x.grid
seq(-1,2,,100) -> x.grid
(note the double commas)
For example, generate 100 random numbers
between 0 and 1 and put them in
random.numbers
runif(100) -> random.numbers