Course Information

18-696GZ: Statistical Learning for Data Science (I)




Statistical learning refers to modeling and analyzing complex datasets using a
variety of statistical tools. It is a recently developed area in statistics and blends with parallel developments in computer science and especially in machine learning. With the advent of ever-growing “Big Data” problems,
professional people with statistical learning skills are in high demand. Due to the complexity and depth of this discipline, we expect to offer two consecutive courses (I and II) spanning two semesters. This course is designed as an introductory class to statistical learning, with no background in mathematical science required. Topics that will be covered in this course include but are not exclusive to: basics in statistical learning, linear regression (simple linear regression, multiple linear regression), classification (logistic regression, linear discriminant analysis, quadratic discriminant analysis, k-nearest neighbor), resampling (cross-validation, permutation test, bootstrapping), and linear model selection and regularization (subset selection, shrinkage method, dimension reduction, high dimensional statistics). Time allowing, we will cover more topics. The programming language R will be introduced and largely used through the entire class. Homework is expected every week. Some are designed for students to implement the statistical learning algorithms using R. Grading will be based on participation in class, homework, mid-term exam, and final exam.

Prerequisites: None

Last Modified: 2016-11-11 1:51PM

Semesters offered:

  • Fall 2016