Christopher M. Bishop
Pattern Recognition and Machine Learning
----------------------------------------

1 - lecture
2 - further reading belonging to the lecture
3 - background reading to improve on basics
--------------------------------------------

1 Introduction (2)
1.1 Example: Polynomial Curve Fitting (1)

1.2 Probability Theory (3)
1.2.1 Probability densities (3)
1.2.2 Expectations and covariances (3)
1.2.3 Bayesian probabilities (3)
1.2.4 The Gaussian distribution (3)

1.2.5 Curve fitting re-visited (2)
1.2.6 Bayesian curve fitting (2)

1.5 Decision Theory (1)
1.5.1 Minimizing the misclassification rate (1)
1.5.2 Minimizing the expected loss (1)
1.5.3 The reject option (1)
1.5.4 Inference and decision (2)

2 Probability Distributions (3)
2.1 Binary Variables (3)
2.1.1 The beta distribution (3)
2.3 The Gaussian Distribution (3)
2.3.1 Conditional Gaussian distributions (3)
2.3.2 Marginal Gaussian distributions (3)
2.3.3 Bayes’ theorem for Gaussian variables (3)
2.3.4 Maximum likelihood for the Gaussian (3)
2.3.5 Sequential estimation (3)
2.3.6 Bayesian inference for the Gaussian (3)
2.3.9 Mixtures of Gaussians (3)

3 Linear Models for Regression (1)
3.1 Linear Basis Function Models (1)
3.1.1 Maximum likelihood and least squares (1)
3.1.2 Geometry of least squares (1)
3.1.3 Sequential learning (1)
3.1.4 Regularized least squares (1)
3.2 The Bias-Variance Decomposition (2)
3.3 Bayesian Linear Regression (1)
3.3.1 Parameter distribution (1)
3.3.2 Predictive distribution (2)

4 Linear Models for Classification (1)
4.1 Discriminant Functions (1)
4.1.1 Two classes (1)
4.1.2 Multiple classes (1)
4.1.3 Least squares for classification (2)
4.1.4 Fisher’s linear discriminant (1)
4.1.5 Relation to least squares (2)
4.1.6 Fisher’s discriminant for multiple classes (2)
4.1.7 The perceptron algorithm (1)
4.2 Probabilistic Generative Models (1)
4.2.1 Continuous inputs (1)
4.2.2 Maximum likelihood solution  (1)
4.3 Probabilistic Discriminative Models (1)
4.3.1 Fixed basis functions (2)
4.3.2 Logistic regression (1)
4.3.3 Iterative reweighted least squares (2)
4.4 The Laplace Approximation (1)
4.5 Bayesian Logistic Regression (3)
4.5.1 Laplace approximation (3)
4.5.2 Predictive distribution (3)

6 Kernel Methods (1)
6.1 Dual Representations (1)
6.2 Constructing Kernels (1)
6.3 Radial Basis Function Networks (2)

7 Sparse Kernel Machines (1)
7.1 Maximum Margin Classifiers (1)
7.1.1 Overlapping class distributions (1)
7.1.3 Multiclass SVMs (2)
7.1.4 SVMs for regression (1)

9 Mixture Models and EM (1)
9.1 K-means Clustering (1)
9.1.1 Image segmentation and compression (3)
9.2 Mixtures of Gaussians (1)
9.2.1 Maximum likelihood (1)
9.2.2 EM for Gaussian mixtures (1)
9.4 The EM Algorithm in General (2)

5 Neural Networks (1)
5.1 Feed-forward Network Functions (1)
5.1.1 Weight-space symmetries (2)
5.2 Network Training (1)
5.2.1 Parameter optimization (1)
5.2.2 Local quadratic approximation (1)
5.2.3 Use of gradient information (1)
5.2.4 Gradient descent optimization (1)
5.3 Error Backpropagation (1)
5.3.1 Evaluation of error-function derivatives (1)
5.3.2 A simple example (1)
5.3.3 Efficiency of backpropagation (2)
5.3.4 The Jacobian matrix (3)
5.4 The Hessian Matrix (3)
5.4.1 Diagonal approximation (3)
5.4.2 Outer product approximation (3)
5.4.3 Inverse Hessian (3)
5.4.4 Finite differences (3)
5.4.5 Exact evaluation of the Hessian (3)
5.4.6 Fast multiplication by the Hessian (3)
5.5 Regularization in Neural Networks (1)
5.5.1 Consistent Gaussian priors (2)
5.5.2 Early stopping (1)
5.5.6 Convolutional networks (1)