As already discussed at The Data Science Lab, massive open online courses are shaking many concepts in the traditional higher-education landscape. The mere fact that thousands of students can now simultaneously attend graduate-level lectures all around the globe without physically putting a foot on campus might lead to the redefinition of the “college experience”.
Thus far, the courses offered by Coursera and edX could be understood as an extension of otherwise regular university courses, simply made available to students outside the classroom by means of technology. The materials did somehow exist in a similar form prior to being offered online; transferring them to a MOOC-appropriate format certainly involves extra overhead work from the lecturer, but the structure of the syllabus essentially remains unchanged.
But as an article in Forbes reports, an audacious player in the MOOC providers league, Udacity, is set to disrupt the market with its offers of coaching for students as well as verified certification for their final projects. As Michael Horn bluntly puts it:
But the real disruption in U.S. higher education was never going to come from slapping traditional courses online for free. That is mostly glorified edutainment—not a bad thing for humanity by any means and potentially a useful upgrade over a traditional textbook, but not disruptive to the higher education sector writ large in and of itself. The real disruption in higher education was always going to come from a new system that looks quite difrom the current one, begins by serving nonconsumers of traditional higher education, and integrates with employer needs to help students make progress in their lives because of an understanding that employers are ultimately—like it or not—the end customers for higher education because they ultimately finance much of the system for students.
Browsing Udacity’s offering in the Data Science track, one finds interesting videos and catchy trailers. We can choose among Introduction to Data Science, Data Wrangling with MongoDB, Introduction to Hadoop and MapReduce and some other courses, priced at around 100$-150$/month for a duration of 1-2 months.
I do not doubt that there is a market for Udacity’s offers, and I am sure that the quality of their mentoring and materials are worth the investment. However, let’s us not forget that there are other approaches to MOOCs, as this interview with Prof. Abu-Mostafa illustrates. He makes a very interesting and profound point, with which we at The Data Science Lab thoroughly agree:
Stick to your guns. Don’t water down the course to increase the numbers. Make the course as interesting as possible WITHOUT compromising the rigor and the content. What matters is what the students actually learn and retain. This is real education not a video game or a popularity contest.
What is, in your opinion, the best way to organize MOOCs in hot topics, such as Data Science, that attract tons of attention from media and aspiring practitioners alike?
The concept of distance, asynchronous learning is not an invention of the digital age. Correspondence and radio courses were already a thing in the past century, offering value to continuous learners and people with otherwise no possibility of attending traditional schools. With the Internet widely spreading over the last decade though, a market for massive open online courses has emerged. Coursera and edX might well be two of the best known providers. Incidentally, Andrew Ng, the co-founder of the former, is a professor of Computer Science and researcher in the field of artificial intelligence. Considering the fact that data science is enjoying lots of popularity at the moment, and that still few higher education institutions are offering comprehensive data science degrees, it is perhaps no surprise that some of the best attended online courses are on machine learning and related data analysis topics.
Below is a brief review of some of my favorite courses.
Machine Learning by Andrew Ng, Stanford (via Coursera)
Ng’s course has become somewhat of a classic for machine learning beginners and a good introduction to the topic. During 10 weeks, the Stanford professor covers single- and multi-variable linear regression, logistic regression, regularization, neural networks, support vector machines, clustering, and recommender systems, and finishes with general advice for real applications of machine learning. The review quizzes, which can be repeated multiple times, allow the materials to sink in, and the programming exercises, which must be completed in octave/matlab, provide the opportunity to see the methods in action. Most mathematics gory details are skipped, which arguably appeals to students seeking a hands-on approach. The programming exercises are designed as supervised step-by-step milestones, which somehow constrains the room for creativity. On the other hand the provided scripts work flawlessly and help understand and visualize what one is doing.
Learning From Data by Yaser S. Abu-Mostafa, Caltech (via edX)
Prof. Abu-Mostafa’s introductory machine learning course is a real gem. He’s an excellent educator who, not surprisingly, won the Feynman prize for excellence in teaching in 1996. The course covers similar material to that of Ng’s, however the focus is more on the underlying mathematical notions and is thus more formal. Each week two one-hour lectures are presented, followed by a homework set containing ten questions. At the end of the ten weeks a final exam is given. The participants are free to choose their favorite language to solve the exercises; only solutions, hence no code, are submitted. The questions in the quizzes can be answered only once, making it more challenging than its Coursera equivalent. I particularly enjoyed the rigorous way of explaining the theory of generalization, which really makes you understand when and why is machine learning possible. Here is an interesting interview with Abu-Mostafa.
Introduction to Data Science by Bill Howe, University of Washington (via Coursera)
This course, taught by Prof. Howe, has an impressive syllabus to begin with. From relational algebra to parallel databases, including Hadoop, MapReduce, (No)SQL, text analysis, and visualization, it seems to cover everything a data scientist needs. In practice, the workload is slightly unbalanced, with some assignments being clearly more challenging than others. The assignments offer the possibility of getting familiar with catchy topics such as sentiment analysis of tweets, Kaggle competitions, MapReduce, Amazon Elastic Cloud and the popular visualization software Tableau. The homework comprises both automatically and peer-graded exercises. The 8-week duration feels a bit rushed, especially towards the end, but overall it was a fun compilation of assignments.
Natural Language Processing by Michael Collins, Columbia University (via Coursera)
Michael Collins does a great job in this NLP course, which covers very interesting topics in a nice formal and rigorous way. His notes are superb, and the topics chosen provide a solid basis for computational linguistics. There are bi-weekly quizzes, plus three mandatory programming assignments that are challenging enough to keep students occupied for the 10-week course duration. Coding can be done in any language, submitted are just the result files that need to comply with a specific format. I really enjoyed implementing three of the algorithms that Collins very clearly presents in his videos: hidden Markov models for classification, the CKY decoder for parsing, and the translation alignments for the IBM models. NLP skills are certainly a nice addition to any data scientist’s set of tools.
Data Analysis by Jeff Leek, Johns Hopkins University (via Coursera)
This is a course on applied statistics focused on data analysis. It puts an emphasis on teaching students how to organize and carry on a data analysis, and to write up a report end-to-end. In addition to the weekly review quizzes, there are two peer-reviewed data analysis assignments, which involve the submission of a full report plus figures and references. That in itself is a nice touch, since it forces you to explain clearly what you learned. The presentation of the statistics concepts and the weekly quizzes rely quite heavily on the R language. The course is not loaded with mathematical derivations, but it does provide a good introduction to the usage of R in data analysis, arguably the most spread tool among statisticians.
This is by no means a comprehensive list of online courses for machine learning. New programs are continuously added to the stack of materials of interest for data scientists. Don’t be afraid of enrolling and trying out some of the courses; you might find that the level and focus is exactly right for you, or else you can always let it sit for a semester and come back in future sessions, for many courses are offered yearly.