Foundations of Machine Learning 2018/19

African Masters in Machine Intelligence (AMMI) at AIMS Rwanda

Syllabus

Part 1: Mathematical Foundations

Part 2: Machine Learning

  • Graphical Models (slides, Chris Bishop's book chapter)
    • Directed graphical models
    • Undirected graphical models
    • D-separation
  • Dimensionality Reduction with Principal Component Analysis (slides, MML book chapter)
    • Maximum variance perspective
    • Projection perspective
    • Key steps of PCA in practice
    • Probabilistic PCA
    • Other perspectives of PCA
  • Linear Regression (slides, MML book chapter)
    • Maximum likelihood estimation
    • Maximum a posteriori estimation
    • Bayesian linear regression
    • Distribution over functions
  • Model Selection (slides, MML book chapter)
    • Cross validation
    • Information criteria
    • Bayesian model selection
    • Occam's razor and the marginal likelihood
  • Gaussian Process Regression (slides, GPML book)
    • Model
    • Inference with Gaussian processes
    • Training via evidence maximization
    • Model selection
    • Interpreting the hyper-parameters
    • Practical tips and tricks when working with Gaussian processes
  • Bayesian Optimization (slides)
    • Optimization of meta-parameters in machine learning systems
    • Acquisition functions
    • Practicalities
    • Applications
  • Sampling (slides)
    • Monte Carlo estimation
    • Importance sampling
    • Rejection sampling
    • Markov chain Monte Carlo
    • Metropolis Hastings
    • Slice sampling
    • Gibbs sampling
  • Density Estimation with Gaussian Mixture Models (slides, MML book chapter)
    • Mixture models
    • Parameter estimation
    • Implementation
    • Latent variable perspective
  • Classification with Logistic Regression (slides)
    • Logistic sigmoid and as a posterior class probability
    • Implicit modeling assumptions
    • Maximum likelihood estimation
    • MAP estimation
    • Probabilistic model
    • Laplace approximation
    • Bayesian logistic regression
  • Information Theory (slides by Pedro Mediano)
    • Entropy
    • KL divergence
    • Mutual information
    • Coding theory
    • Information theory and statistical inference
  • Variational Inference (slides)
    • Inference as optimization
    • Evidence lower bound
    • Conditionally conjugate models
    • Mean-field variational inference in conditionally conjugate models
    • Stochastic variational inference
    • Black-box variational inference for hierarchical Bayesian models
    • Gradient estimators
    • Amortized inference
    • Richer posteriors

Team

  • Marc Deisenroth (Lecturer)
  • Kossi Amouzouvi (Tutor, AIMS Rwanda)
  • Oluwafemi Azeez (Tutor, CMU Africa)
  • Steindór Sæmundsson (Tutor, Imperial College London)
  • Pedro Martinez Mediano (Tutor, Imperial College London)