ST512 — Statistics for Biological Sciences II

Updated 05/05/2019
 Prequisites: ST511
 Prequisites: ST511
 Textbook: Ott & Longnecker, An Introduction to Statistical Methods & Data Analysis, 7th Edition. [publisher's website]
 Lectures: Tuesdays and Thursdays, 8:30–9:45am
 Lab sessions:
 Tuesdays, 4:30–5:45pm, in 205 Winston Hall (001B)
 Wednesdays, 8:30–9:45am, 312 Poe Hall (001C)
 Wednesdays, 11:45am–1:00pm, 2106 SAS Hall (001D)
 Course assistants:
 Lab TAs: Mr. Haoyu Wang (001B, hwang38@ncsu.edu) and Ms. Lu Lu (001CD, llu2@ncsu.edu)
 Grader: Mr. Yunshu Zhang (yzhan234@ncsu.edu)
Please check this section from time to time for updates to the schedule or other information.
05/05: A summary of the Exam 3 scores is here.
05/05: My solutions to Exam 3 are here.
04/25: Just a reminder, our final exam is from 8:00–11:00am (in the usual room) on Tuesday, April 30th. It's not my intention to write an exam that will take 3 hours to finish, but I'll give you the entire 3hour window so that you have time to read the questions carefully, etc. You may bring two pages (front and back, total of four sides) of handwritten notes for the exam.
04/23: Please ignore Problem 5.5 on last spring's exam. Counting the number of Latin square designs is more complicated than I realized at the time, so my solution is incorrect. It's not of any consequence to us here, I just asked that question for fun anyway.
04/22: As promised, the solutions to the review problems and old exam are here and here, respectively. You'll have opportunity to ask questions about these (or whatever you want) during the lab session this week and during the inclass review session on Thursday.
04/18: Sorry for the delay, but the practice problems for Exam 3 and the exam I gave last spring are here and here, respectively. I'll have solutions for these posted by Monday next week; you'll have a review session with the lab TAs on Tuesday/Wednesday and we'll review in class on Thursday next week.
04/04: FYI, here's some details about the schedule for the rest of the semester. Today we will finish with all I planned to say about block designs. Next week (and maybe into Tuesday of the following week) we will discuss random and mixedeffects models for designs, in Chapter 17 of Ott & Longnecker, where the levels of certain factors are chosen randomly rather than fixed in advance. That will leave us with about two lecture days, in which I plan to cover some material from Chapter 18 of Ott & Longnecker about splitplots, etc. Then we will have a final review session on our last day of class, Thursday, April 25th.
03/25: A summary of the Exam 2 scores is here. Scores will be posted on moodle soon and graded exam papers will be returned tomorrow.
03/25: My solutions to Exam 2 are here.
03/14: Happy PI day! Solutions to the review problems and the old exam are here and here, respectively. I've also posted solutions to the nottobehandedin Homework 4 below.
03/08: A few of you mentioned to me after class on 03/06 that I screwed up the drawings intended to illustrate the relationships between the means when interaction is and is not present  thanks for letting me know. So I have drawn a better/correct picture in the notes I uploaded just now. Sorry for the confusion.
03/06: As promised, review problems and the exam I gave last spring are here and here, respectively. I'll post solutions to these next week and you'll have a chance to ask me any question about these (or anything else you want to know) during the review session on Tuesday after spring break; you'll have opportunities to ask questions during the lab sessions that week too. Note, as I mentioned previously, last spring I was experimenting with true/false questions on the exams. That didn't work out very well, in my opinion, so I won't ask a bunch of true/false questions like you see on the old exam. But I could ask a couple or ask for some explanation of the concepts that are touched on in those questions, so you shouldn't ignore them entirely.
03/06: So that you're not bored during spring break, I have posted the 4th homework assignment below. However, as I mentioned in class yesterday, I will not collect this one, it's just to help you with your preparation for Exam 2 after spring break. I'll post solutions to these problems by the middle of next week so that you have time to practice on your own and then to check your answers against mine.
03/04: On Thursday, 02/28, we talked about contrasts in class, and I'll have a few more things to say about contrasts tomorrow. But this is a difficult concept and a complete understanding involves some nontrivial math, so I'm going to postpone my (summary) presentation of these details till after Exam 2. I did write an extensive piece in the running lecture notes (under the 02/28 entry) and I would encourage you to look that over.
02/21: Since I'll be away on Tuesday, Feb 26th, I have to cancel my regular office hours. But I expect to be back in the office on my regular schedule by Wednesday and, in case there's anything you need to see me about, please send me an email and we can try to find some other time during the week to meet.
02/21: Unfortunately, I have to adjust our exam schedule again, we'll do it after spring break instead of before. Specifically, I'm moving the second midterm exam from Thursday, March 7th to Thursday, March 21st. We'll do an inclass review on Tuesday, March 19th. I apologize for the inconvenience.
02/21: I'll be out of the country over the weekend and won't return till Tuesday afternoon. This means that I'll miss our class Tuesday morning, but I've arranged for a substitute to cover for me. So, we will have class and lab as usual next week. By the way, I will have very limited internet access while I'm away, so if you write me an email then it's unlikely I'll be able to respond before Wednesday next week. So if there's something urgent you need to discuss, please email me today.
02/21: Homework 3 is posted below.
02/19: Here is a summary of the Exam 1 scores.
02/15: Here are my solutions to the exam for you to take a look at. If you have questions, we can discuss on Tuesday next week. If all goes according to plan, the exams will be graded and ready to be returned in class on Tuesday.
02/13: Sorry for the delay, but the solutions to Homework 2 are now posted; see below.
02/13: There were a couple typos/omissions pointed out to me in the solutions I posted before to the review problems. In particular, I copied down a number from the SAS output wrong in the solution to Problem 10(f), I didn't multiply through by 37 in the numerator and denominator in the solutions to Problem 10(bc) which caused some confusion (but didn't affect the final answer), and I didn't give a solution to Problem 11 previously. These have all been updated.
02/11: As promised, solutions to the review problems are here and solutions to the old exam are here. You can ask me about these problems (or anything else) during the review session in class tomorrow. You will also have the opportunity to ask questions about these during the lab sessions this week.
02/07: I think I mentioned this on the first day of class, but now is a good time to remind you. To the midterm exam, you may bring a onepage (both sides, 8.5 x 11) sheet of handwritten notes. On this you can write whatever formulas, definitions, concepts, etc. that you like.
02/06: There was a typo in my html source code, so the link the Spring 2018 midterm exam (see below) was broken. Now it's fixed, thanks to all of you who let me know about the mistake!
02/05: To help you prepare for the upcoming exam, there are a set of review/practice problems here and the exam I gave last spring is here. A few comments on these are in order. First, the review problems are not meant to be representative of problems you might see on the exam, just some exercises to remind you about stuff we covered or to get you thinking about stuff we didn't cover. The old exam is a better representation of what you'll see on the exam this time in terms of style and length; note that I might give some true/false questions this time, but not as many as I did last spring. I'll post solutions to these later (probably Monday morning) and you'll have opportunties to ask me and/or the TAs about these problems if you have questions.
01/29: I'm now thinking that it's better (for various reasons) to have our exam on a Thursday. So I'm moving the first midterm exam from Tuesday, Feb 12th to Thursday, Feb 14th.
01/13: The first homework assignment is posted, see below.
01/10: The office hour schedule has now been posted, here on the website and on the syllabus. This schedule will begin on Monday, Jan 14th.
01/08: There were some late staffing changes so we will not have lab during the first week. Labs will start according to the usual schedule on Tuesday, January 15th.
01/08: My expectation is that you will be able to pick up virtually all of the SAS that you need for the homework assignments, etc, from your experiences in the lab and from the examples that I go over in lecture. However, this probably is not be enough for you to really learn SAS. I personally don't know SAS all that well—I especially have trouble to remember the abbreviations for the various options—but, fortunately, there is a wealth of information on the web to help. As we proceed, I will post some stuff from the web that I think is helpful. For now, here is some general information that I've found that I think is pretty useful:
Aside from web resources, there are lots of (relatively inexpensive) books available as well. One that I used a long time back, The Little SAS Book, by Delwiche and Slaughter, is pretty good; not all the stuff in this book is relevant to ST512, but a good resource to have if learning SAS is important for your future goals.
 Introduction to SAS
 Go here and/or here for information about preparing SAS output for inclusion in a report.
01/08: The 7th edition of the textbook is on reserve at the library. So if you don't have your own copy of the book, you can check the book out for an hour or so to take pictures of the exercises, etc., assigned for homework.
01/08: I'm going to keep a "log" of the material covered each day, along with the relevant sections in Ott & Longnecker's book, in the "Course Outline" section below. This is partly to help me keep a record and partly to help in case students have to miss a lecture day.
01/08: Homework and exam scores for this course will be posted on moodle, which can be accessed from the WolfWare website here. Please check your grades occasionally to be sure that your scores have been recorded correctly. (Moodle inserts a column labeled "Course Total" which is based on some default weights, etc., which I don't use at all. At the end of the semester, I will download the homework and exam scores to a spreadsheet and do the grade calculations based on the formula in the syllabus.)
01/08: Here are a few important dates—I will remind you of these things in class when the time is near. First, Spring Break is March 11th–15th so there will be no class that week. Second, our midterm exams are tentatively scheduled for Tuesday, February 12th and Thursday, March 7th. Third, the final exam will be held on Tuesday, April 30th, 8–11am, according to the university's schedule.
01/08: Welcome to ST512!
My plan is to maintain a running, semidetailed summary of what I talked about in each lecture. I don't want to have these notes prepared in advance because then it's like the lectures are scripted, which would be boring for all of us. So I'll do the process in reverse—write notes based on (only the good parts of) what I said in the class.
My running notes (Updated 04/24/2019)
Introduction.
I'm talking about a specific, extra type of integrity that is beyond not lying, but bending over backwards to show how you're maybe wrong, that you ought to have when acting as a scientist.
Simple Linear Regression.
Multiple Linear Regression.
Consider a multiple linear regression situation with 10 candidate predictor variables. Assume that none of these predictor variables are useful to explain variation in the response, i.e., that the null hypothesis for the overall Ftest is TRUE. In this case, if we repeat the experiment, say, 250 times, using the same (irrelevant) predictor variables in each case, then we would expect the pvalues of that overall Ftest to be less than 0.05 in only 12 or 13 of the experiments. However, if we use the AIC criterion to first select a "good" model, and then perform the overall Ftest, the pvalues look very different from how we'd expect. A summary of what I'm talking about is here. According to the theory behind the ideas in ST512, the pvalues of the overall Ftest in this case should be such that the histogram bars are all about the same height, but they aren't. There are a lot more small pvalues then we would expect based on the theory—about 135 compared to 13—which means that the use of data to select a "good" model messes things up enough that the theory is no longer true. Practically, using data to select the model introduces some bias that makes models look better than they actually are. It is in this sense that using data to help choose a model can invalidate the statistical inference.There is really nothing wrong with using the data to select a model, provided that one accounts for the selection effect. However, most people do not make any correction, they just report their selected model as if it were the only model they considered, and carry out the analysis as usual. But ignoring the selection effect messes up pvalue interpretation and is one of the causes of the lackofreproducibility problems. Statisticians are now actively working on ways to account for the selection effect so that inferences will remain valid, but this is a very hard problem.
Analysis of Variance and Factorial Designs.
Analysis of Variance for Blocked Designs.
Additional Topics.
All students should be registered for one of the ST512 lab sessions. These almostweekly meetings are designed to help students learn how to use SAS to implement the various statistical methods introduced in class. In each session, with the help of the lab assistant, students will complete a short assignment using SAS which is intended to be helpful for solving part of the homework due the following week. The lab exercises will be posted here (hopefully) well before the lab session, so you have a chance to look it over ahead of time; solutions will be posted here following the lab sessions.
January 15th and 16th.
Assignment: 512lab01.pdf
SAS file: 512lab01.sas
Data file: grapes.xlsx
Solutions: 512lab01_soln.pdfJanuary 22nd and 23rd.
Assignment: 512lab02.pdf
SAS file: 512lab02.sas
Solutions: 512lab02_soln.pdfJanuary 30th and 31st.
Assignment: 512lab03.pdf
SAS file: 512lab03.sas
Solutions: 512lab03_soln.pdfFebruary 5th and 6th.
Assignment: 512lab04.pdf
SAS file: 512lab04.sas
Solutions: 512lab04_soln.pdfFebruary 12th and 13th. Review for Exam 1.
February 19th and 20th.
Assignment: 512lab05.pdf
SAS file: 512lab05.sas
Solutions: 512lab05_soln.pdfFeb 26 and 27th.
Assignment: 512lab06.pdf
SAS file: 512lab06.sas
Solutions: 512lab06_soln.pdfMarch 5th and 6th.
Assignment: 512lab07.pdf
SAS file: 512lab07.sas
Solutions: 512lab07_soln.pdfMarch 12th and 13th. No lab—spring break.
March 19th and 20th. Review for Exam 2.
March 26th and 27th.
Assignment: 512lab08.pdf
SAS file: 512lab08.sas
Solutions: 512lab08_soln.pdfApril 2nd and 3rd.
Assignment: 512lab09.pdf
SAS file: 512lab09.sas
Solutions: 512lab09_soln.pdfApril 9th and 10th.
Assignment: 512lab10.pdf
SAS file: 512lab10.sas
Solutions: 512lab10_soln.pdfApril 16th and 17th. No lab.
April 23rd and 24th. Review for Exam 3.
Homework assignments will consist mainly of problems taken from the Ott & Longnecker text, but may also include some problems of my own. Note that we are using the 7th edition, and some of the problems and/or numbering are different compared to the 6th edition. These assignments will involve a combination of theoretical/conceptual and computational exercises. Some general comments:
Homework solutions will be posted here after the due dates.
 Assignments will be collected at the beginning of class on the day it's due, and solutions will be posted here shortly after the due date.
 You are welcome to discuss the homework with your classmates, but each student must submit their own independent writeup of the solutions. Copying the work of others (which includes your classmates, people who post materials on the web, etc) is not acceptable.
 For problems that involve work with SAS, please include (a hard copy of) your code and relevant output with your submission. SAS will return lots of output, some of which is not relevant to what we are discussing. You should only submit output (tables, graphs, etc.) that are used specifically to answer the questions being asked. As a ruleofthumb, do not include any output that you don't refer to specifically in your writeup. Being prudent with the output you submit will not only save paper and the grader's time, but will also help you learn—in "real life" work or research, it is not enough present an exhaustive set of statistical output and leave it up to your boss, client, or readers to sort out what it means.
Homework 01 — Due Thursday 01/24/2019.
Assignment: 512hw01.pdf
Data files: ex1122.xlsx ex1140.xlsx
Solutions: 512hw01_soln.pdf 512hw01_soln.sasHomework 02 — Due Thursday 02/12/2019.
Assignment: 512hw02.pdf
Data files: ex1211.xlsx ex1223.xlsx
SAS files: 512hw02_prob5.sas
Solutions: 512hw02_soln.pdf 512hw02_soln.sasHomework 03 — Due Thursday 03/07/2019.
Assignment: 512hw03.pdf
Data files: ex832.xlsx ex839.xlsx
SAS files: 512hw03_prob3.sas 512hw03_prob4.sas
Solutions: 512hw03_soln.pdf 512hw03_soln.sasHomework 04 — Not due!
Assignment: 512hw04.pdf
Data files: ex148.xlsx
Solutions: 512hw04_soln.pdf 512hw04_soln.sasHomework 05 — Due Thursday 04/11/2019.
Assignment: 512hw05.pdf
Data files: ex165.xlsx
Solutions: 512hw05_soln.pdf 512hw05_soln.sasHomework 06 — Due Thursday 04/25/2019.
Assignment: 512hw06.pdf
Data files: ex173.xlsx ex177.xlsx ex1730.xlsx
SAS files: 512hw06_prob4.sas
Solutions: 512hw06_soln.pdf 512hw06_soln.sas