ST 790C-001 - Statistical Machine Learning and Data Mining



Instructor's Announcement        


Lectures: MW 10:15-11:30am, SAS 5270 | Syllabus
Office Hours: W 2-3pm (or by appointment)
Textbooks: The Element of Statistical Learning:data miming, inference, and prediction Hastie, Tibshirani, and Friedman (2001). Here is the link to the book.
Reference Books:
  • Pattern Recognition and Neural Networks by B. Ripley (1996)
  • Learning with Kernels by Scholkopf and Smola (2000)
  • The Nature of Statistical Learning Theory by Vapnik (1998)

    Useful Links:
  • Kernel Machines | Tibshirani's Lasso Page | Hastie's Software and Data
  • Local Working Group

    Software:
  • R Manual | LIBSVM

    )
    Course Activities
    Week 1 (Jan 11-15) Read Chapter 1: Introduction Lecture 1 Note (Jan 11)
    Supplementary Reading: Data mining and statistics: what is the connection? Friedman (1997)
    Week 2 (Jan 18-22) Read Section 2.4: Overview of Supervised Learning Lecture 2 Note (Jan 20)
    Supplementary Reading: An overview of statistical learning theory, Vapnik (1999)
    Week 3 (Jan 25-29) Read Section 2.3: Binary Classification (I) Lecture 3 Notes (Jan 25)
    Homework 1, Solution
    Week 4 (Feb 1-5) Read Chapter 4 (4.3, 4.4) : Binary Classification (II): LDA and Logistic Regression Lecture 4 Notes (updated Feb 2)
    Week 5 (Feb 8-12) Read Chapter 4 (4.3, 4.4): Multiclass Classification (I) Lecture 5 Notes
    Homework 2 , Solution
    Week 6 (Feb 15-19) Read Chapter 4 (4.4) Multiclass Classification (II) Lecture 6 Notes
    Read Chapter 4 (4.5)Separating Hyperplanes (I) Lecture 7 Notes
    Week 7 (Feb 22-26) Read Chapter 12 (12.1): Mathematical Optimization Lecture 8 Notes
    Read Chapter 12 (12.2, 12.3)Binary Support Vector Machines Lecture 9 Notes
    Supplementary Reading: The Entire Regularization Path for the Support Vector Machine Hastie et al. (2004) Homework 3 , Solution
    Week 8 (March 1-5) Read Chapter 12 (12.4): Loss View of SVM Lecture 10 Notes
    Week 9 (March 8-12) Read Chapter 12 (12.1): Multiclass Support Vector Machines Lecture 11 Notes
    Week 10 (March 15-19) Spring Break
    Week 11 (March 22-26) Read Chapter 9 (9.2) : Tree-Based Methods Lecture 12 Notes
    Class Presentation: Credit Scoring. Presenter: Murilo Homework 4, Solution
    Read Chapter 9 (9.3) : Bagging and Random Forest Lecture 13 Notes
    Week 12 (March 29-Apr 2) Read Chapter 10 : Boosting Lecture 14 Notes
    Week 13 (April 5-9) Read Chapter 3 (3.1-3.3): Linear Regression Models Lecture 15 Notes
    Homework 5,Solution
    Read Chapter 3 (3.4) : Variable Selection Lecture 16 Notes
    Week 14 (Apr 12-16) Read Chapter 3 (3.5) : PCA, PCR, and PLS Lecture 17 Notes
    Class Presentation: Robust Fisher LDA. Presenter: Geng Yuan (04/12)
    Class Presentation: SVM for cancer classification. Presenter: David Vock (04/14)
    Week 15 (April 19-23) Read Chapter 9 (9.1): GLM, Additive Modles, and GAM Lecture 18 Notes
    Class Presentation: Shrinkage Variable Selection for SVM. Presenter: Chen-Yen Lin (04/19) Homework 6 , Solution
    Class Presentation: Efficient Pairwise Classification. Presenter: Weining Shen (04/22)
    Week 16 (April 26-30) Read Chapter 14 : Unsupervised Learning and Cluster Analysis Lecture 19 Notes
    Class Presentation: Active Learning. Presenter: Yu-Cheng Ku (04/26) (due 05/04)
    Class Presentation: RKHS theory for SVM learning. Presenter: Dehan Kong (04/30)



    Paper List for Journal Club (2010):
  • Support vector machines and kernels for computational biology by Ben-Hur, A, Ong, C, Sonnenburg, S, Scholkopf, B, and Ratsch, G (2008), PLoS Computational Biology, 4.
  • A tutorial on nu-support vector mahinces by Chen, P, Lin, C, and Scholkopf, B.
  • Support vector machine classification and validation of cancer tissue samples using microarray expression data by Furey, Cristianini, Duffy, Bednarski, Schummer and Haussler (2000), Bioinformatics.
  • In defense of one-vs-all classification by Rifkin and Klautau (2004), Journal of Machine Learning Research, 5, 101-141.
  • Efficient pairwise classification by Park and Furnkranz (2007), Machine Learning: ECML 2007, 658-665.
  • Recent advances in speech recognition by Furui (1997), Pattern Recognition Letters.
  • Robustness of Fisher's linear discriminant function under two-component mixed normal models by Ashikaga and Chang, 76, 676-680, JASA.


    Final Project
  • Description
  • List of Papers (for literature review)
  • Due on 2010 May 7th, 5pm


    Course Policies and Related Information
  • Anditing: Auditors are expected to attend class regularly and submit homework on the same schedule as the other students. The final grade for auditors (AU or NR) will be based on their final homework average. A homework score of 75 or better is required for an AU.
  • Academic Integrity: The University policy on academic integrity is spelled out in Appendix L of the NCSU Code of Student Conduct. For a more though elaboration see the NCSU Office of Student Conduct website. For this course group work on homework is encouraged. However copying someone else's work and calling them your own is plagiarism, so the work you turn in should be your own.
  • For students with disabilities: Reasonable accommodations will be made for students with verifiable disabilities. In order to take advantage of available accommodations, students must register with Disability Services for Students (DSS), 1900 Student Health Center, CB# 7509, 515-7653.
  • Online Class Evaluation: Online class evaluations will be available for students to complete during the last two weeks of class (November 26-December 9). All evaluations are confidential; instructors will never know how any one student responded to any question, and students will never know the ratings for any particular instructors. Click Online evaluation. More information at ClassEval.