The concept of distance, asynchronous learning is not an invention of the digital age. Correspondence and radio courses were already a thing in the past century, offering value to continuous learners and people with otherwise no possibility of attending traditional schools. With the Internet widely spreading over the last decade though, a market for massive open online courses has emerged. Coursera and edX might well be two of the best known providers. Incidentally, Andrew Ng, the co-founder of the former, is a professor of Computer Science and researcher in the field of artificial intelligence. Considering the fact that data science is enjoying lots of popularity at the moment, and that still few higher education institutions are offering comprehensive data science degrees, it is perhaps no surprise that some of the best attended online courses are on machine learning and related data analysis topics.
Below is a brief review of some of my favorite courses.
Machine Learning by Andrew Ng, Stanford (via Coursera)
Ng’s course has become somewhat of a classic for machine learning beginners and a good introduction to the topic. During 10 weeks, the Stanford professor covers single- and multi-variable linear regression, logistic regression, regularization, neural networks, support vector machines, clustering, and recommender systems, and finishes with general advice for real applications of machine learning. The review quizzes, which can be repeated multiple times, allow the materials to sink in, and the programming exercises, which must be completed in octave/matlab, provide the opportunity to see the methods in action. Most mathematics gory details are skipped, which arguably appeals to students seeking a hands-on approach. The programming exercises are designed as supervised step-by-step milestones, which somehow constrains the room for creativity. On the other hand the provided scripts work flawlessly and help understand and visualize what one is doing.
Learning From Data by Yaser S. Abu-Mostafa, Caltech (via edX)
Prof. Abu-Mostafa’s introductory machine learning course is a real gem. He’s an excellent educator who, not surprisingly, won the Feynman prize for excellence in teaching in 1996. The course covers similar material to that of Ng’s, however the focus is more on the underlying mathematical notions and is thus more formal. Each week two one-hour lectures are presented, followed by a homework set containing ten questions. At the end of the ten weeks a final exam is given. The participants are free to choose their favorite language to solve the exercises; only solutions, hence no code, are submitted. The questions in the quizzes can be answered only once, making it more challenging than its Coursera equivalent. I particularly enjoyed the rigorous way of explaining the theory of generalization, which really makes you understand when and why is machine learning possible. Here is an interesting interview with Abu-Mostafa.
Introduction to Data Science by Bill Howe, University of Washington (via Coursera)
This course, taught by Prof. Howe, has an impressive syllabus to begin with. From relational algebra to parallel databases, including Hadoop, MapReduce, (No)SQL, text analysis, and visualization, it seems to cover everything a data scientist needs. In practice, the workload is slightly unbalanced, with some assignments being clearly more challenging than others. The assignments offer the possibility of getting familiar with catchy topics such as sentiment analysis of tweets, Kaggle competitions, MapReduce, Amazon Elastic Cloud and the popular visualization software Tableau. The homework comprises both automatically and peer-graded exercises. The 8-week duration feels a bit rushed, especially towards the end, but overall it was a fun compilation of assignments.
Natural Language Processing by Michael Collins, Columbia University (via Coursera)
Michael Collins does a great job in this NLP course, which covers very interesting topics in a nice formal and rigorous way. His notes are superb, and the topics chosen provide a solid basis for computational linguistics. There are bi-weekly quizzes, plus three mandatory programming assignments that are challenging enough to keep students occupied for the 10-week course duration. Coding can be done in any language, submitted are just the result files that need to comply with a specific format. I really enjoyed implementing three of the algorithms that Collins very clearly presents in his videos: hidden Markov models for classification, the CKY decoder for parsing, and the translation alignments for the IBM models. NLP skills are certainly a nice addition to any data scientist’s set of tools.
Data Analysis by Jeff Leek, Johns Hopkins University (via Coursera)
This is a course on applied statistics focused on data analysis. It puts an emphasis on teaching students how to organize and carry on a data analysis, and to write up a report end-to-end. In addition to the weekly review quizzes, there are two peer-reviewed data analysis assignments, which involve the submission of a full report plus figures and references. That in itself is a nice touch, since it forces you to explain clearly what you learned. The presentation of the statistics concepts and the weekly quizzes rely quite heavily on the R language. The course is not loaded with mathematical derivations, but it does provide a good introduction to the usage of R in data analysis, arguably the most spread tool among statisticians.
This is by no means a comprehensive list of online courses for machine learning. New programs are continuously added to the stack of materials of interest for data scientists. Don’t be afraid of enrolling and trying out some of the courses; you might find that the level and focus is exactly right for you, or else you can always let it sit for a semester and come back in future sessions, for many courses are offered yearly.