This page contains resources about Online Learning and Sequential Prediction.
Subfields and Concepts[]
- Recursive Least Squares
- Mini-Batch Learning
- Mini-Batch Gradient Descent Methods
- Decision Theory
- Information Theory
- Entropy
- Kullback-Leibler (KL) Divergence
- Game Theory
- Minimax Theorem
- Blackwell's Approachability
- Online Dictionary Learning
- Online Algorithms
- Wake-Sleep Algorithm
- Auto-Encoding Variational Bayes (AEVB) Algorithm
- Online Convex Optimization
- Regret Bound
- Bregman Divergence
- No-regret Learning
- Online Gradient Descent
- Online Subgradient Descent
- Mirror Descent
- Stochastic Gradient Descent (SGD)
- Mini-batch Gradient Descent Methods
- Follow The Regularized Leader (FTRL)
- Multi-Armed Bandit (MAB)
- Regularization
- L2-regularization / Tikhonov regularization / Ridge regression
- L1-regularization / Least absolute shrinkage and selection operator (LASSO)
- Matrix Regularization
Online Courses[]
Video Lectures[]
- Online Learning with a Memory Harness by Shai Shalev-Shwartz - VideoLectures.NET
- Trading Regret Rate for Computational Efficiency in Online Learning with Limited Feedback by Shai Shalev-Shwartz - VideoLectures.NET
Lecture Notes[]
- Statistical Learning Theory and Sequential Prediction by Alexander Rakhlin and Karthik Sridharan
- Machine Learning Theory by Karthik Sridharan
- Prediction and Learning: It's Only a Game by Jacob Abernethy
- Learning Theory by Sham Kakade and Ambuj Tewari
- Statistical Learning Theory by Prof. Dmitry Panchenko
- Introduction to Machine Learning by Shai Shalev-Shwartz
- Statistical Learning Theory by Maxim Raginsky
- Introduction to Online Optimization by Sebastien Bubeck
Books and Book Chapters[]
- Hazan, E. (2015). Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4), 157-325.
- Theodoridis, S. (2015). "Chapter 8: Parameter Learning: A Convex Analytic Path". Machine Learning: A Bayesian and Optimization Perspective. Academic Press.
- Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.
- Sra, S., Nowozin, S., & Wright, S. J. (2012). Optimization for machine learning. MIT Press.
- Hazan, E. (2011). "Chapter 10: The Convex Optimization Approach to Regret Minimization". Optimization for machine learning. MIT Press.
- Shalev-Shwartz, S. (2011). Online Learning and Online Convex Optimization. Foundations and Trends® in Machine Learning, 4(2), 107-194.
Scholarly Articles[]
- Villa, S., Rosasco, L. & Poggio, T. (2013). On Learning, Complexity and Stability. arXiv preprint arXiv:1303.5976.
- Arora, S., Hazan, E., & Kale, S. (2012). The Multiplicative Weights Update Method: A Meta-Algorithm and Applications. Theory of Computing, 8(1), 121-164.
- Shalev-Shwartz, S. (2011). Online learning and online convex optimization. Foundations and Trends® in Machine Learning, 4(2), 107-194.
- Abernethy, J., Bartlett, P. L., & Hazan, E. (2011). Blackwell Approachability and No-Regret Learning are Equivalent. In COLT (pp. 27-46).
- Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2009). Online dictionary learning for sparse coding. In Proceedings of the 26th annual international conference on machine learning (pp. 689-696). ACM.
- Ying, Y., & Pontil, M. (2008). Online gradient descent learning algorithms. Foundations of Computational Mathematics, 8(5), 561-596.
- Shalev-Shwartz, S. (2007). Online Learning: Theory, Algorithms, and Applications. PhD Dissertation, Hebrew University.
- Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge University Press.
- Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (pp. 928–936).
- Dietterich, T. G. (2002). Machine learning for sequential data: A review. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (pp. 15-30). Springer Berlin Heidelberg.
See also[]
Other resources[]
- Wiki for research in Online Prediction
- How large should the batch size be for stochastic gradient descent? - Cross Validated Stackexchange
- Should training samples randomly drawn for mini-batch training neural nets be drawn without replacement? - Cross Validated Stackexchange