Family Ski Picture
Kuiper family skiing at Beech Mountain 2024

Patrick Kuiper MA477 Data Science Section A3 Page

Admin Links

Canvas

DAAW

Course Calendar

Student Memo

Bonus Point Tracker

Gmail Address

Documentation Presentation

Instructor Survey

ISLP Book

Lesson Links

Lesson Description Objectives Notes Readings Other Docs
Lesson 1: Course Introduction and Python Review
  • Understand the course outline and syllabus
  • Review base Python and key libraries
L1 Notes Geron 3-38
ISLP 1-13
Course Folder
Lesson 2: Introduction to Statistical Learning
  • Understand Y = f(X) + epsilon
  • Loss functions and accuracy
  • Bias-variance tradeoff
L2 Notes Boardsheet ISLP 1-39 Notebook Titanic Data
Lesson 3: End-to-End Machine Learning
  • ML workflow
  • Data pipelines
  • Cross-validation
L3 Notes Boardsheet Geron 39-74 Notebook Brookline Data
Lesson 4: Regression and Model Evaluation
  • Linear regression
  • Error metrics
  • Model validation
L4 Notes Boardsheet Geron 75-102 Notebook Brookline Data
Lesson 5: Resampling, CV, Bootstrapping
  • Resampling
  • LOOCV and Kfold
  • Bias Variance Trade-Off
L5 Notes Boardsheet ISLP pp.201-215 (Sections 5.1-5.2) Notebook Notebook Exercise
Lesson 6: Linear Regression
  • Understand the linear regression model
  • Assess the linear regression model
L6 Notes Boardsheet ISLP pp.69-90 Notebook Exercise
Lesson 7: Qualitative Predictors
  • Use qualitative predictors in a linear model
  • Interpret interaction term
  • Employ polynomial regression
L7 Notes Boardsheet ISLP pp.91-99 Lesson Notebook
Lesson 8: Model Selection
  • Understand best subseet and stepwise model selection
  • Be familiar with commonly used error metrics
L8 Notes Boardsheet ISLP pp.229-240 Lesson Notebook
Lesson 9: Shrinkage Methods
  • Understand shrinkage methods
  • Employ shrinkage methods
L9 Notes Boardsheet Geron pp. 155-164, ISLP pp. 240-253 Lesson Notebook
Lesson 10: Classification I
  • Understand common performance measures used in classification
  • Use a receiver operating characteristic (ROC)
  • Understand the difference between multiclass and multioutput classification.
L10 Notes Boardsheet Geron pp. 103-130 Lesson Notebook
Lesson 11: Classification II
  • Understand common performance measures used in classification
  • Use a receiver operating characteristic (ROC)
  • Understand the difference between multiclass and multioutput classification.
L11 Notes Boardsheet Geron pp. 103-130 Lesson Notebook
Lesson 12: Logistic Regression
  • Understand why linear regression is not appropriate for a qualitative response
  • Interpret the coefficients of a multiple logistic regression model
  • Use functions in the sklearn module to assess logistic regression models
L12 Notes Boardsheet Geron 164-174, ISLP pp. 135-146 Lesson Notebook
Lesson 13: Linear Discriminant Analysis
  • Understand difference between discriminative and generative models
  • Derive the discriminant function for LDA
  • Use sklearn modules to assess LDA classifiers
L13 Notes Boardsheet Geron p. 257 (small paragraph on LDA), ISLP pp. 146-156 Lesson Notebook Lesson 13 LDA Assignment
Lesson 14-15: Support Vector Machines
  • Understand separating hyperplanes and maximum margin classifier.
  • Understand the details of the support vector classifier
  • Use sklearn modules to assess support vector machine (SVM) classifiers
L14 Notes Boardsheet Geron p. 175-184, ISLP pp. 367-385 Lesson Notebook
Lesson 16: K-Nearest Neighbors
  • Understand the strengths and limitations of k nearest neighbors (p.164 of ISLP).
  • Understand how the bias-variance tradeoff applies when selecting k.
  • Use modules in sklearn to assess k nearest neighbors classifiers.
L16 Notes Boardsheet ISLP: 36-40, 164 Lesson Notebook
Lesson 17: Classification Comparison
  • Review all previous classification lessons since Lesson 9
L17 Notes Boardsheet Geron pp. 175-184, ISLP pp. 367-386 Lesson Notebook
Lesson 18: Regression Trees
  • Understand the basic algorithm for building a regression tree.
  • Use functions in Python's sklearn module to fit regression trees.
  • Know how a regression tree splits the decision space.
  • Understand the advantages and disadvantages of tree-based methods.
L18 Notes Boardsheet ISLP 331-337 Geron pp. 204-208 (Decision Tree regression section) Lesson Notebook
Lesson 19: Classification Trees
  • Understand how the algorithm for classification trees differs from regression trees.
  • Understand and know how to interpret the Gini index.
  • Understand and know how to interpret entropy.
  • Use functions in Python's sklearn module to fit classification trees.
L19 Notes Boardsheet Geron pp 195-203 (Ch 6 Decision Trees) ISLP: 337-34 Lesson Notebook
Lesson 20: Bagging
  • Understand how bagging improves basic decision trees by reducing variance.
  • Define and interpret the out-of-bag error.
  • Use functions in Python's sklearn module to implement bagging.
L20 Notes Boardsheet Geron pp 211-219 (Ensemble Learning and Random Forest section through Out-of-Bag Evaluation) ISLP: 343-346 Lesson Notebook
Lesson 22: Random Forest
  • Understand how the random forest algorithm differs from bagging.
  • Use functions in Python’s sklearn to fit random forest models.
  • Understand how you can extract feature importance using Random Forests.
L22 Notes Boardsheet Geron pp 220-221 (Random Forest section through Feature Importance) ISLP: 346-347 Lesson Notebook
Lesson 23: Boosting Decision Trees
  • Understand the algorithm for boosting regression trees and how it differs from the basic decision tree and bagging.
  • Use functions in Python's sklearn module to fit boosted regression trees.
L23 Notes Boardsheet Geron pp 222-231 (Boosting section up to (not including) Stacking) ISLP: 347-350 Lesson Notebook
Lesson 24-25: Neural Netwokrs I and II
  • Understand the structure and notation of a multilayer neural network.
  • Given inputs and the weights/biases of a multilayer neural network, calculate the predicted outputs for the network.
  • Understand how the cost function and gradient descent are used to update the weights and biases through back propagation.
  • Use functions in Python's keras module to fit multilayer neural networks.
L24-25 Notes Boardsheet Geron pp 299-329 (Intro to Neural Networks with Keras through 'Building a regression MLP using the sequential API') ISLP: 399-406 Lesson Notebook
Lesson 26-27: CNNs I and II
  • Understand the architecture of a typical convolutional neural network to include convolution and pooling layers.
  • Understand how data augmentation improves model performance in image classification.
  • Use functions in Python’s keras module to fit convolutional neural networks for image classification.
L26-27 Notes Boardsheet Geron pp 479-494 (Deep Computer Vision Using CNNs through 'Implementing Pooling Layers with Keras') ISLP: 406-413 Lesson Notebook
Lesson 28-30: RNNs I and II
  • Be able to explain the structure of recurrent neural networks, including how they process sequential data and the role of hidden states.
  • Be able to build and train RNN models using Python and deep learning frameworks
  • Be able to evaluate RNN performance using appropriate metrics, compare RNNs with alternative models
L28-30 Notes Boardsheet Geron pp 537-575 (Section on Processing Sequences Using RNNs and CNNs) Lesson Notebook
Lesson 31: Transfer Learning
  • Know how to leverage pretrained models from Keras for transfer learning
L31 Notes Skim Geron pp 495-516 (Architectures), Read Geron pp 516-521 Lesson Notebook
Lesson 32-33: Dimensionality Reduction I-II
  • Understand the curse of dimensionality
  • Know how to calculate principal components (score and loading vectors).
  • Understand PCA as an application of SVD and eigen decomposition of the covariance matrix.
  • Use functions in sci-kit learn to perform principal components analysis for dimension reduction and identifying outliers.
  • Use PCA to reduce dimensionality and reconstruct images or data using principal components.
L32-33 Notes Geron pp. 237-258, ISLP 504-515 Lesson Notebook
Lesson 34: Clustering I
  • Understand the k-means and hierarchical clustering algorithms.
  • Use functions in sklearn to perform clustering, such as DBSCAN.
  • Understand practical issues in clustering.
L34 Notes Boardsheet Geron pp. 263-283, ISLP pp 520-535 Lesson Notebook