NCSU : ST512 : Spring 2019

ST512 — Statistics for Biological Sciences II  
Spring 2019  

Updated 05/05/2019

Instructor data

Name:   Ryan Martin
Office:   5238 SAS Hall
Phone:   919-515-1920
Email:   rgmarti3 AT ncsu.edu (best way to contact me)

Course data

Syllabus:   PDF file
Prequisites:   ST511
Textbook:   Ott & Longnecker, An Introduction to Statistical Methods & Data Analysis, 7th Edition.  [publisher's website]
Download and install SAS and R software:
Virtual access to SAS software:
If you have a Mac machine or just don't want to download and install SAS, then you can access it "virtually" by following the instructions here to get to the stat virtual teaching lab. If you're off-campus, then you also need to set up a VPN; there's some instructions on the linked page above for setting this up.
Weekly meetings:
  • Lectures:   Tuesdays and Thursdays, 8:30–9:45am, in 301 Riddick Hall.
  • Lab sessions:
    • Tuesdays, 4:30–5:45pm, in 205 Winston Hall (001B)
    • Wednesdays, 8:30–9:45am, 312 Poe Hall (001C)
    • Wednesdays, 11:45am–1:00pm, 2106 SAS Hall (001D)
Course assistants:
  • Lab TAs:   Mr. Haoyu Wang (001B, hwang38@ncsu.edu) and Ms. Lu Lu (001CD, llu2@ncsu.edu)
  • Grader:   Mr. Yunshu Zhang (yzhan234@ncsu.edu)
Office hours:
  • with instructor, Tu 10–11am or by appointment.
  • with TAs (in SAS 1101), M 11am–noon, 4–5pm; Tu 3–4pm; W 10–11am; Th 10–11am; F 2–3pm.

Announcements, etc.

Please check this section from time to time for updates to the schedule or other information.

05/05: A summary of the Exam 3 scores is here.

05/05: My solutions to Exam 3 are here.

04/25: Just a reminder, our final exam is from 8:00–11:00am (in the usual room) on Tuesday, April 30th. It's not my intention to write an exam that will take 3 hours to finish, but I'll give you the entire 3-hour window so that you have time to read the questions carefully, etc. You may bring two pages (front and back, total of four sides) of handwritten notes for the exam.

04/23: Please ignore Problem 5.5 on last spring's exam. Counting the number of Latin square designs is more complicated than I realized at the time, so my solution is incorrect. It's not of any consequence to us here, I just asked that question for fun anyway.

04/22: As promised, the solutions to the review problems and old exam are here and here, respectively. You'll have opportunity to ask questions about these (or whatever you want) during the lab session this week and during the in-class review session on Thursday.

04/18: Sorry for the delay, but the practice problems for Exam 3 and the exam I gave last spring are here and here, respectively. I'll have solutions for these posted by Monday next week; you'll have a review session with the lab TAs on Tuesday/Wednesday and we'll review in class on Thursday next week.

04/04: FYI, here's some details about the schedule for the rest of the semester. Today we will finish with all I planned to say about block designs. Next week (and maybe into Tuesday of the following week) we will discuss random- and mixed-effects models for designs, in Chapter 17 of Ott & Longnecker, where the levels of certain factors are chosen randomly rather than fixed in advance. That will leave us with about two lecture days, in which I plan to cover some material from Chapter 18 of Ott & Longnecker about split-plots, etc. Then we will have a final review session on our last day of class, Thursday, April 25th.

03/25: A summary of the Exam 2 scores is here. Scores will be posted on moodle soon and graded exam papers will be returned tomorrow.

03/25: My solutions to Exam 2 are here.

03/14: Happy PI day! Solutions to the review problems and the old exam are here and here, respectively. I've also posted solutions to the not-to-be-handed-in Homework 4 below.

03/08: A few of you mentioned to me after class on 03/06 that I screwed up the drawings intended to illustrate the relationships between the means when interaction is and is not present -- thanks for letting me know. So I have drawn a better/correct picture in the notes I uploaded just now. Sorry for the confusion.

03/06: As promised, review problems and the exam I gave last spring are here and here, respectively. I'll post solutions to these next week and you'll have a chance to ask me any question about these (or anything else you want to know) during the review session on Tuesday after spring break; you'll have opportunities to ask questions during the lab sessions that week too. Note, as I mentioned previously, last spring I was experimenting with true/false questions on the exams. That didn't work out very well, in my opinion, so I won't ask a bunch of true/false questions like you see on the old exam. But I could ask a couple or ask for some explanation of the concepts that are touched on in those questions, so you shouldn't ignore them entirely.

03/06: So that you're not bored during spring break, I have posted the 4th homework assignment below. However, as I mentioned in class yesterday, I will not collect this one, it's just to help you with your preparation for Exam 2 after spring break. I'll post solutions to these problems by the middle of next week so that you have time to practice on your own and then to check your answers against mine.

03/04: On Thursday, 02/28, we talked about contrasts in class, and I'll have a few more things to say about contrasts tomorrow. But this is a difficult concept and a complete understanding involves some non-trivial math, so I'm going to postpone my (summary) presentation of these details till after Exam 2. I did write an extensive piece in the running lecture notes (under the 02/28 entry) and I would encourage you to look that over.

02/21: Since I'll be away on Tuesday, Feb 26th, I have to cancel my regular office hours. But I expect to be back in the office on my regular schedule by Wednesday and, in case there's anything you need to see me about, please send me an email and we can try to find some other time during the week to meet.

02/21: Unfortunately, I have to adjust our exam schedule again, we'll do it after spring break instead of before. Specifically, I'm moving the second midterm exam from Thursday, March 7th to Thursday, March 21st. We'll do an in-class review on Tuesday, March 19th. I apologize for the inconvenience.

02/21: I'll be out of the country over the weekend and won't return till Tuesday afternoon. This means that I'll miss our class Tuesday morning, but I've arranged for a substitute to cover for me. So, we will have class and lab as usual next week. By the way, I will have very limited internet access while I'm away, so if you write me an email then it's unlikely I'll be able to respond before Wednesday next week. So if there's something urgent you need to discuss, please email me today.

02/21: Homework 3 is posted below.

02/19: Here is a summary of the Exam 1 scores.

02/15: Here are my solutions to the exam for you to take a look at. If you have questions, we can discuss on Tuesday next week. If all goes according to plan, the exams will be graded and ready to be returned in class on Tuesday.

02/13: Sorry for the delay, but the solutions to Homework 2 are now posted; see below.

02/13: There were a couple typos/omissions pointed out to me in the solutions I posted before to the review problems. In particular, I copied down a number from the SAS output wrong in the solution to Problem 10(f), I didn't multiply through by 37 in the numerator and denominator in the solutions to Problem 10(bc) which caused some confusion (but didn't affect the final answer), and I didn't give a solution to Problem 11 previously. These have all been updated.

02/11: As promised, solutions to the review problems are here and solutions to the old exam are here. You can ask me about these problems (or anything else) during the review session in class tomorrow. You will also have the opportunity to ask questions about these during the lab sessions this week.

02/07: I think I mentioned this on the first day of class, but now is a good time to remind you. To the midterm exam, you may bring a one-page (both sides, 8.5 x 11) sheet of handwritten notes. On this you can write whatever formulas, definitions, concepts, etc. that you like.

02/06: There was a typo in my html source code, so the link the Spring 2018 midterm exam (see below) was broken. Now it's fixed, thanks to all of you who let me know about the mistake!

02/05: To help you prepare for the upcoming exam, there are a set of review/practice problems here and the exam I gave last spring is here. A few comments on these are in order. First, the review problems are not meant to be representative of problems you might see on the exam, just some exercises to remind you about stuff we covered or to get you thinking about stuff we didn't cover. The old exam is a better representation of what you'll see on the exam this time in terms of style and length; note that I might give some true/false questions this time, but not as many as I did last spring. I'll post solutions to these later (probably Monday morning) and you'll have opportunties to ask me and/or the TAs about these problems if you have questions.

01/29: I'm now thinking that it's better (for various reasons) to have our exam on a Thursday. So I'm moving the first midterm exam from Tuesday, Feb 12th to Thursday, Feb 14th.

01/13: The first homework assignment is posted, see below.

01/10: The office hour schedule has now been posted, here on the website and on the syllabus. This schedule will begin on Monday, Jan 14th.

01/08: There were some late staffing changes so we will not have lab during the first week. Labs will start according to the usual schedule on Tuesday, January 15th.

01/08: My expectation is that you will be able to pick up virtually all of the SAS that you need for the homework assignments, etc, from your experiences in the lab and from the examples that I go over in lecture. However, this probably is not be enough for you to really learn SAS. I personally don't know SAS all that well—I especially have trouble to remember the abbreviations for the various options—but, fortunately, there is a wealth of information on the web to help. As we proceed, I will post some stuff from the web that I think is helpful. For now, here is some general information that I've found that I think is pretty useful:

Aside from web resources, there are lots of (relatively inexpensive) books available as well. One that I used a long time back, The Little SAS Book, by Delwiche and Slaughter, is pretty good; not all the stuff in this book is relevant to ST512, but a good resource to have if learning SAS is important for your future goals.

01/08: The 7th edition of the textbook is on reserve at the library. So if you don't have your own copy of the book, you can check the book out for an hour or so to take pictures of the exercises, etc., assigned for homework.

01/08: I'm going to keep a "log" of the material covered each day, along with the relevant sections in Ott & Longnecker's book, in the "Course Outline" section below. This is partly to help me keep a record and partly to help in case students have to miss a lecture day.

01/08: Homework and exam scores for this course will be posted on moodle, which can be accessed from the WolfWare website here. Please check your grades occasionally to be sure that your scores have been recorded correctly. (Moodle inserts a column labeled "Course Total" which is based on some default weights, etc., which I don't use at all. At the end of the semester, I will download the homework and exam scores to a spreadsheet and do the grade calculations based on the formula in the syllabus.)

01/08: Here are a few important dates—I will remind you of these things in class when the time is near. First, Spring Break is March 11th–15th so there will be no class that week. Second, our midterm exams are tentatively scheduled for Tuesday, February 12th and Thursday, March 7th. Third, the final exam will be held on Tuesday, April 30th, 8–11am, according to the university's schedule.

01/08: Welcome to ST512!


Course outline, notes, supplements, etc.

My plan is to maintain a running, semi-detailed summary of what I talked about in each lecture. I don't want to have these notes prepared in advance because then it's like the lectures are scripted, which would be boring for all of us. So I'll do the process in reverse—write notes based on (only the good parts of) what I said in the class.

My running notes (Updated 04/24/2019)

  1. Introduction.

    • You should be familiar with, roughly, Chapters 1–8 in Ott & Longnecker.
    • On the first day (01/08), after going over the syllabus and some other intro stuff, including a bit of philosophy, I'll do a bit of review of what I expect you would have seen before in a previous course. This is far from a comprehensive review, the purpose is to set our baselines and terminology, and for me to demonstrate the philosophy in action. Example: twosample.R.
    • Some info about (my) philosophy of statistics/science:
      • Nice quote from physicist Richard Feynman:
        I'm talking about a specific, extra type of integrity that is beyond not lying, but bending over backwards to show how you're maybe wrong, that you ought to have when acting as a scientist.
      • You might be aware of some of the recent scrutiny that statistics has come under in light of the "replication crisis" and other things. In fact, even p-values, a core concept in statistics in general and ST512 in particular, have taken a beating. To me, this criticism has less to do with the statistical tools and methods themselves and more to do with the "claims must stand up to scrutiny" notion that I discussed in class. For example, if I'm a researcher with data that support a result that I could publish in Science or Nature, then I don't really have an incentive to carefully scrutinize that result. Replication studies that are starting to gain traction now are intended to create that incentive through a system of checks-and-balances.
      • To follow up on the previous point, I'm not suggesting that statisticians are free of responsibility, we've screwed up too. A friend of mine and I have written about this here, though there's certainly more that can be said. And the reason I talk about skin in the game, etc., in lecture is because I'm trying to do my part to make clear to scientists that statistics is only a tool to aid in making a strong case, that doing a statistical analysis doesn't in itself make a convincing argument.
      • As an aside, the paper linked above is available on a platform, called Researchers.One, that I co-founded. It's intended to be an alternative to the current peer review and publication model, with a number of different goals and features; more details are available here. If you're interested in or passionate about peer review, open-access publications, etc., then I'd be happy to discuss these things with you.

  2. Simple Linear Regression.

    • We will cover, roughly, Chapter 11 in Ott & Longnecker.
    • Beer & blood alcohol level data set used in class is here.
    • A nice and quick guide to PROC REG in SAS is here.
    • BUPA data, containing results from various blood tests related to liver disorders, used in class is here. More information about the data set is available from UCI here.
    • Some information about the Box–Cox power transformation can be found here. I mentioned this in class because it is the method that suggested a log transformation in the analysis of BUPA data was appropriate. You don't need to know this method for ST512, this is just in case you're interested.
    • Very rough log of material covered:
      • 01/10: Finish intro/review and start simple linear regression; Sections 11.1–11.2.   beer.R
      • 01/15: More simple linear regression; Sections 11.2 and 11.3.   example_11-4.sas   example_11-4.R
      • 01/17: More on simple linear regression; Sections 11.3 and 11.4.
      • 01/22: More simple linear regression, diagnostic plots; Sections 11.4 and 11.5.   example_11-10.sas   example_11-10.R
      • 01/24: More diagnostic plots and correlation; Section 11.5 and 11.6.   bupa.R

  3. Multiple Linear Regression.

    • We will cover, roughly, Chapter 12 (and parts of Chapter 13) in Ott & Longnecker.
    • SAS carries out the search for decent models pretty easily; see codes below. As far as I know, R doesn't do this quite so easily. For some of my own work, I was using the external R package leaps and, in particular, the function regsubsets.
    • Model selection is somehow both necessary and controversial. Modeling is obviously important, but where it gets sticky is when you use the data to help choose the model to use. Picking a model that "fits well" makes the estimates biased in a certain sense, which might make the tests and confidence intervals invalid. Here is an example:
      Consider a multiple linear regression situation with 10 candidate predictor variables. Assume that none of these predictor variables are useful to explain variation in the response, i.e., that the null hypothesis for the overall F-test is TRUE. In this case, if we repeat the experiment, say, 250 times, using the same (irrelevant) predictor variables in each case, then we would expect the p-values of that overall F-test to be less than 0.05 in only 12 or 13 of the experiments. However, if we use the AIC criterion to first select a "good" model, and then perform the overall F-test, the p-values look very different from how we'd expect. A summary of what I'm talking about is here. According to the theory behind the ideas in ST512, the p-values of the overall F-test in this case should be such that the histogram bars are all about the same height, but they aren't. There are a lot more small p-values then we would expect based on the theory—about 135 compared to 13—which means that the use of data to select a "good" model messes things up enough that the theory is no longer true. Practically, using data to select the model introduces some bias that makes models look better than they actually are. It is in this sense that using data to help choose a model can invalidate the statistical inference.
      There is really nothing wrong with using the data to select a model, provided that one accounts for the selection effect. However, most people do not make any correction, they just report their selected model as if it were the only model they considered, and carry out the analysis as usual. But ignoring the selection effect messes up p-value interpretation and is one of the causes of the lack-of-reproducibility problems. Statisticians are now actively working on ways to account for the selection effect so that inferences will remain valid, but this is a very hard problem.
    • Rough log of material covered:

  4. Analysis of Variance and Factorial Designs.

    • We will cover, roughly, Chapter 14 (and parts of Chapters 8 & 9) in Ott & Longnecker.
    • Rough log of material covered:
      • 02/19: intro to experimental design, completely randomized designs; Sections 2.5 and 14.2.   crd_balance.R
      • 02/21: details for one-way ANOVA, contrasts; Sections 8.3, 9.2, and 14.2.   example_14-2.sas
      • 02/26: follow-ups to overall F-test, contrasts; Section 9.2.   pea.sas
      • 02/28: more on contrasts; Section 9.2.   spacing.sas
      • 03/05: multiple comparisons, start factorial designs; Sections 9.3 and 14.3.   example_14-6.sas
      • 03/07: more factorial designs, interactions; Section 14.3.
      • 03/26: three-way layouts and unbalanced designs; Sections 14.3 and 14.4.   example_14-7.sas   example_14-8.sas

  5. Analysis of Variance for Blocked Designs.

    • We will cover parts of Chapters 15 and 16 in Ott & Longnecker.
    • The discussion of block designs focused on the case without replication, i.e., when each treatment is assigned to exactly one unit in the block; this is how the book presents it. But, for example, in a randomized complete block design, the treatment could be applied to more than one unit in a given block, though it is recommended that this be done in a balanced way across all the block/treatment combinations. Specifically, in the context of the pesticide problem in Example 15.1, the researcher could have split each plot into six regions and applied each pesticide to two of the regions instead of just one. Such a design is called generalized RCBD, discussed here. Replication is more complicated for the Latin square designs, but see page 330 of Oehlert's book.
    • Rough log of material covered:

  6. Additional Topics.

    • Will will cover parts of Chapters 17 and 18 in Ott & Longnecker, random- and mixed-effects models, split-plot designs, etc.
    • Rough log of material covered:
      • 04/09: One-way and two-way random effects models; Sections 17.2, 17.3.   example_17-1.sas   example_17-2.sas
      • 04/11: More random effects; mixed effects models. Sections 17.3–17.4.   example_17-3.sas  
      • 04/16: Mixed effect models, nested designs, start split-plots; Sections 17.4 and 17.6.   example_17-4.sas   example_17-11.sas
      • 04/18: Overview of stuff we don't cover in ST512: logistic regression and optimal design of experiments.
      • 04/23: Split-plot designs; Section 18.2.   example_18-1.sas


Labs

All students should be registered for one of the ST512 lab sessions. These almost-weekly meetings are designed to help students learn how to use SAS to implement the various statistical methods introduced in class. In each session, with the help of the lab assistant, students will complete a short assignment using SAS which is intended to be helpful for solving part of the homework due the following week. The lab exercises will be posted here (hopefully) well before the lab session, so you have a chance to look it over ahead of time; solutions will be posted here following the lab sessions.

January 15th and 16th.

Assignment:   512lab01.pdf
SAS file:   512lab01.sas
Data file:   grapes.xlsx
Solutions:   512lab01_soln.pdf

January 22nd and 23rd.

Assignment:   512lab02.pdf
SAS file:   512lab02.sas
Solutions:   512lab02_soln.pdf

January 30th and 31st.

Assignment:   512lab03.pdf
SAS file:   512lab03.sas
Solutions:   512lab03_soln.pdf

February 5th and 6th.

Assignment:   512lab04.pdf
SAS file:   512lab04.sas
Solutions:   512lab04_soln.pdf

February 12th and 13th. Review for Exam 1.

February 19th and 20th.

Assignment:   512lab05.pdf
SAS file:   512lab05.sas
Solutions:   512lab05_soln.pdf

Feb 26 and 27th.

Assignment:   512lab06.pdf
SAS file:   512lab06.sas
Solutions:   512lab06_soln.pdf

March 5th and 6th.

Assignment:   512lab07.pdf
SAS file:   512lab07.sas
Solutions:   512lab07_soln.pdf

March 12th and 13th. No lab—spring break.

March 19th and 20th. Review for Exam 2.

March 26th and 27th.

Assignment:   512lab08.pdf
SAS file:   512lab08.sas
Solutions:   512lab08_soln.pdf

April 2nd and 3rd.

Assignment:   512lab09.pdf
SAS file:   512lab09.sas
Solutions:   512lab09_soln.pdf

April 9th and 10th.

Assignment:   512lab10.pdf
SAS file:   512lab10.sas
Solutions:   512lab10_soln.pdf

April 16th and 17th. No lab.

April 23rd and 24th. Review for Exam 3.


Homework

Homework assignments will consist mainly of problems taken from the Ott & Longnecker text, but may also include some problems of my own. Note that we are using the 7th edition, and some of the problems and/or numbering are different compared to the 6th edition. These assignments will involve a combination of theoretical/conceptual and computational exercises. Some general comments:

Homework solutions will be posted here after the due dates.

Homework 01 — Due Thursday 01/24/2019.

Assignment:   512hw01.pdf
Data files:   ex11-22.xlsx   ex11-40.xlsx
Solutions:   512hw01_soln.pdf   512hw01_soln.sas

Homework 02 — Due Thursday 02/12/2019.

Assignment:   512hw02.pdf
Data files:   ex12-11.xlsx   ex12-23.xlsx
SAS files:   512hw02_prob5.sas
Solutions:   512hw02_soln.pdf   512hw02_soln.sas

Homework 03 — Due Thursday 03/07/2019.

Assignment:   512hw03.pdf
Data files:   ex8-32.xlsx   ex8-39.xlsx
SAS files:   512hw03_prob3.sas   512hw03_prob4.sas
Solutions:   512hw03_soln.pdf   512hw03_soln.sas

Homework 04 — Not due!

Assignment:   512hw04.pdf
Data files:   ex14-8.xlsx
Solutions:   512hw04_soln.pdf   512hw04_soln.sas

Homework 05 — Due Thursday 04/11/2019.

Assignment:   512hw05.pdf
Data files:   ex16-5.xlsx
Solutions:   512hw05_soln.pdf   512hw05_soln.sas

Homework 06 — Due Thursday 04/25/2019.

Assignment:   512hw06.pdf
Data files:   ex17-3.xlsx   ex17-7.xlsx   ex17-30.xlsx
SAS files:     512hw06_prob4.sas
Solutions:   512hw06_soln.pdf   512hw06_soln.sas