Wednesday, February 21, 2018

Video series relevant to ML

You can go top-down from code to theory (the fast.ai way) or bottom-up from theory to code (the academic way). I'm going to assume you're in it for the long haul and want to understand the principles (to the degree that there are principles) first; as such, you'd take a course like Stanford's CS 229 before courses on deep learning. (You can actually take EE 263, CS 229, CS 224N, CS 231N, CS 234 remotely as a non-degree student, trading ~$1000/unit for graded assessment.)

Videos give intuition, but won't teach you to do anything useful. You need to do exercises. If the book/psets used in the courses don't have solutions, find another book that does and work some of those exercises as warm-up.  (Reading two books at once is a common math study tactic.)

Background for intro ML classes
  • Harvard Stats 110, Probability Theory: YouTube
    • A first course in calculus-based probability theory is pretty much assumed in any serious discussion of ML. This one seems typical, based on Blitzstein's book.
    • For CS 229, you might be able to get away with 1-2 weeks of probability theory covered in a typical discrete math course, but you'd have to do some additional work since facility with continuous distributions (such as the univariate and multivariate normal distributions) is assumed all over the place in ML.
  • Mathematical Statistics: ?
    • I don't know of a good video course on mathematical statistics. 
    • For CS 229, it's helpful to have this background at the Casella & Berger level (sufficient statistics, the significance of exponential families, etc.) for your own understanding, but you can survive without it since this level of rigor is never actually required in psets.
  • MIT 18.06, Linear Algebra: YouTube (1999 lectures), YouTube (2011 sections)
    • This is a first course in applied linear algebra, at a lower-division level of rigor but working up through valuable applied topics like SVD, basic linear dynamical systems, least squares and pseudoinverses, etc.
    • For CS 229, you might be able to get away with the linear algebra covered in the usual ABET combined linear algebra / differential equations course as taught to engineers in sophomore year (i.e., maybe 5-7 weeks of instruction in linear algebra). But for your own sanity, you should probably buy Strang's Introduction and refresh/extend your understanding using this course.
  • Stanford EE 263, Linear Dynamical Systems (2008): YouTube
    • The first 1/3 of this course reviews the material of a course like MIT 18.06 or Stanford EE 103 (based on Boyd's undergrad book). (You can also use EE 263 as a bootcamp version of the prerequisites, but since most students will have actually taken the linear algebra prerequisite, it will be tough to compete in the course itself.) The middle 1/3 or so is stuff like least norm, SVD, etc. This is a quick introduction to a lot of relevant methods for fitting, optimization, etc. The final 1/3 or so covers linear dynamical systems (which are relevant to reinforcement learning) and a few additional relevant linear algebra topics such as Cayley-Hamilton. You should be ok with sophomore-level differential equations; for the latter third of the course, it is assumed that you can (or can learn to) apply the Laplace Transform, so an upper-division diffeq course is helpful (but not necessary if you're ok with applying it as a black box).
    • For CS 229, certain bits relating to control in reinforcement learning will make a lot more sense with this background. Also, most of the linear algebra proofs in CS 229 will be at exactly this level, so it's very good practice.
 Survey / Introduction
  • Stanford CS 229, Machine Learning (2009): YouTube 
    • CS 229 is not an applied course on ML. And the emphasis is very different from the "Andrew Ng" Coursera ML course, which focuses on rudimentary understanding of a few ML algorithms. CS 229 is a quick overview of the mathematics (mostly at a lower-division level of rigor) behind regression, SVMs, EM, basic RL, and so on. If your linear algebra background is weaker than 18.06 and (in a few places) EE 263, this will severely limit your understanding; you have to be able to absorb the material from the lecture notes and not expect much more than intuition from the lectures.
    • While the level of rigor is lower-division (meaning, you use calculus rather than analysis), you had better be very facile with reading and writing proofs. Most pset and exam proofs are calculations (i.e., "show that" by rewriting one expression into another expression), but if you don't feel pretty comfortable with proof by counterexample, proof by contradiction, etc. you are going to have issues with 1-3 problems per pset.
    • If you take this course for credit, be aware that an awful lot of people will already be familiar with all of the material (e.g., from watching all of the 2009 videos and doing the psets). Moreover, many seem to have access to previous pset solutions. So it's a real grind to do this honestly. Also, many students will already have teams set up and semi-started on their projects before the quarter starts (e.g., collecting data) so do not measure progress by the stated milestone dates.
  •  UCL, Introduction to RL (2015): YouTube 
Deep Learning
  • UC Berkeley CS 294-129 (2016), Deep Learning: YouTube
  • Stanford CS 231N, Convolutional Neural Networks for Computer Vision  (2017): YouTube
  • Stanford CS 224N, Deep Learning for NLP (2017): YouTube
  • UC Berkeley CS 294-112 (2017), Deep RL: YouTube (Spring), YouTube (Fall) 
Less directly relevant stuff:
  • Stanford EE 364AB, Convex Optimization  (2008):  YouTube 
    • All of ML involves optimization in some form or another, so many people recommend this course (based on Boyd's book) if you're interested in statistical learning theory.
    • You can definitely get by in CS 229 without this, and convex optimization has little direct relevance to deep learning (for example).  But I think the serious ML students take it for culture (e.g., as prep for Stats 231/CS 229T).
    • If you take this course for credit, you should take EE 263, if only to gird yourself for the 24-hour take-home exams.
  • UCCS Math 535, Applied Functional Analysis: YouTube
    • A course based on Kreyszig. Caveat: I can't vouch for this as I didn't go through this series; I took a course on Fourier analysis that used the Hilbert space method.
    • You don't need this for CS 229 (for example, uniform convergence is mentioned exactly once, and only in a hand-wavy way; Hilbert spaces are never mentioned). But deeper discussions in topics like kernel methods and statistical learning theory are often framed in the language of functional analysis.