This class brings machine learning theory, tools, and real-world datasets together to teach students how to analyze massive data effectively and efficiently. It is designed to be self-containted and consists of 3 parts:
In the first part, we review some required maths. In the second part, we introduce fundamental machine learning concepts/models/algorithms. And lastly, in part 3 we discuss how large-scale machine learning differs from small-scale learning tasks. In particular, we focus on the deep learning techniques (big models) and present some recent, exiting advances in the field.
This course is intended for senior undergraduate and junior graduate students who understand
2017/01/23 - 15. Semisupervised/Transfer Learning: slides announced
2017/01/23 - 14. Unsupervised Learning: slides announced
2017/01/03 - Final Competition on Image Captioning: notebook announced
2017/01/03 - 12. CNN: slides announced
2016/12/27 - 13. RNN: slides announced
2016/12/20 - 11. NN Opt & Reg: slides announced
2016/12/12 - 10. NN Design: slides announced
2016/11/22 - 09. Large-Scale ML: slides announced
2016/11/08 - 08. CV & Ensembling: slides announced
2016/11/02 - Midterm Competition on News Popularity: notebook announced
2016/11/01 - 07. KNN & SVM: slides announced
2016/10/25 - 06. Probabilistic Models: slides announced
2016/10/18 - 05. Learning Theory & Regularization: slides announced
2016/10/04 - 04. Numerical Optimization: slides announced
2016/09/19 - 01-03. Math Review: slides announced
The class was offered in Fall 2016 and has ended. However, we will continue updating the materials. If you have any feedback, feel free to contact: shwu [AT] cs.nthu.edu.tw
What's ML? | About This Course... | FAQ
This lab guides you through the setup of scientific Python environment and provides useful references for self-reading.
Span & Linear Dependence | Norms | Eigendecomposition | Singular Value Decomposition | Traces | Determinant
This lab guides you through the process of Exploratory Data Analysis (EDA) and discuss how you can leverage the Principle Component Analysis (PCA) to visualize and understand high-dimensional data.
Random Variables & Probability Distributions | Multivariate & Derived Random Variables | Bayes’ Rule & Statistics | Principal Components Analysis | Information Theory | Decision Trees & Random Forest
In this lab, we will apply the Decision Tree and Random Forest algorithms to the classification and dimension reduction problems using the Wine dataset.
Numerical Computation | Optimization Problems | Unconstrained Optimization | Stochastic Gradient Descent | Perceptron | Adaline | Constrained Optimization | Linear & Polynomial Regression | Duality
In this lab, we will guide you through the implementation of Perceptron and Adaline, two of the first machine learning algorithms for the classification problem. We will also discuss how to train these models using the optimization techniques.
This lab guides you through the linear and polynomial regression using the Housing dataset. We will also extend the Decision Tree and Random Forest classifiers to solve the regression problem.
Point Estimation | Bias & Variance | Consistency | Decomposing Generalization Error | Weight Decay | Validation
In this lab, we will guide you through some common regularization techniques such as weight decay, sparse weight, and validation.
Maximum Likelihood Estimation | Maximum A Posteriori Estimation | Bayesian Estimation
In this lab, we will guide you through the practice of Logistic Regression. We will also introduce some common evaluation metrics other than the "accuracy" that we have been used so far.
KNNs | Parzen Windows | Local Models | Support Vector Classification (SVC) | Nonlinear SVC | Kernel Trick
In this lab, we will classify nonlinearly separable data using the KNN and SVM classifiers. We will show how to pack multiple data preprocessing steps into a single Pipeline in Scikit-learn to simplify the training workflow.
CV | How Many Folds? | Voting | Bagging | Boosting | Why AdaBoost Works?
In this lab, we will guide you through the cross validation technique for hyperparameter selection. We will also practice and compare some ensemble learning techniques.
In this competition, you are provided with raw news articles and the goal is to use everything you have learned so far to predict whether a news article will be intensively shared in online social networking services. Good luck!
When ML Meets Big Data... | Representation Learning | Curse of Dimensionality | Trade-Offs in Large-Scale Learning | SGD-Based Optimization
NN Basics | Learning the XOR | Back Propagation | Cost Function & Output Neurons | Hidden Neurons | Architecture Design
In this lab, we will show how to train a neural network (NN) for text classification using the Keras library. Then we train another neural network, called the word2vec, that embeds words into a dense vector space where semantically similar words are mapped to nearby points.
Momentum & Nesterov Momentum | AdaGrad & RMSProp | Batch Normalization | Continuation Methods & Curriculum Learning | Weight Decay | Data Augmentation | Dropout | Manifold Regularization | Domain-Specific Model Design
In this lab, we will apply some regularization techniques to neural networks over the CIFAR-10 dataset and see how they improve the generalizability.
ConvNet Architecture | Kernel/Filter & Stride & Padding | Pooling | Dilated Convolutions | LeNet | AlexNet | VGGNet | GoogLeNet & Inception Modules | Residual Networks | DenseNets | Stacked Hourglass Networks | Deep Compression
Guest Lecture by Prof. Hwann-Tzong Chen
TBA
Vanilla RNNs | Design Alternatives | Backprop through Time (BPTT) | LSTM | Parallelism & Teacher Forcing | Attention | Explicit Memory | Adaptive Computation Time (ACT) | Memory Networks | Google Neural Machine Translation
TBA
Clustering | Recommendation & Factorization | Dimension Reduction | Predictive Learning | Autoencoders | Manifold Learning | Synthesis & Generation | Generative Adversarial Networks (GANs)
TBA
Label Propagation | Semisupervised GANs | Semisupervised Clustering | Multitask Learning | Weight Initiation & Fine-Tuning | Domain Adaptation | Zero Shot Learning | Unsupervised Transfer Learning | Future at a Glance
Given the Microsoft COCO dataset, your task is to devise and train a model that generates a suitable sentence describing an image. Here is an example:
Following provides links to some useful online resources. If this course starts your ML journey, don't stop here. Enroll yourself in advanced courses (shown below) to learn more.
For more course materials (such as assignments, score sheets, etc.) and online forum please refer to the iLMS system.
Ian Goodfellow, Yoshua Bengio, Aaron Courville, Deep Learning, MIT Press, 2016, ISBN: 0387848576
Trevor Hastie, Robert Tibshirani, Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, Springer, 2009, ISBN: 0387848576
Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006, ISBN: 0387310738
Sebastian Raschka, Python Machine Learning, Packt Publishing, 2015, ISBN: 1783555130