Follow the Data

A data driven blog

Archive for the tag “MOOC”

Book and MOOC

As of today, is stocking a book to which I have contributed, RNA-seq Data Analysis: A Practical Approach. I realize the title might sound obscure to readers who are unfamiliar with genomics and bioinformatics. Simply put, RNA-seq is short for RNA sequencing, a method for measuring what we call gene expression. While the DNA contained in each cell is (to a first approximation) identical, different tissues and cell types turn their genes on and off in different ways in response to different conditions. The process when DNA is transcribed to RNA is called gene expression. RNA-seq has become a rather important experimental method and the lead author of our book, Eija Korpelainen, wanted to put together a user-friendly, practical and hopefully unbiased compendium of the existing RNA-seq data analysis methods and toolkits, without neglecting underlying theory. I contributed one chapter, the one about differential expression analysis, which basically means statistical testing for significant gene expression differences between groups of samples.

I am also currently involved as an assistant teacher in the Explore Statistics with R course given by Karolinska Institutet through the edX MOOC platform. Specifically, I have contributed material to the final week (week 5) which will start next Tuesday (October 7th). That material is also about RNA-seq analysis – I try to show a range of tools available in R which allow you to perform a complete analysis workflow for a typical scenario. Until the fifth week starts, I am helping out with answering student questions in the forums. It’s been a positive experience so far, but it is clear that one can never prepare enough for a MOOC – errors in phrasing, grading, etc are bound to pop up. Luckily, several gifted students are doing an amazing job of answering the questions from other students, while teaching us teachers a thing or two about the finer points of R.

Speaking of MOOCs, Coursera’s Mining Massive Datasets course featuring Jure Leskovec, Anand Rajaraman and Jeff Ullman started today. My plan is to try to follow it – we shall see if I have time.


Coursera Introduction to Data Science Course

I promised in an earlier blog post to report back on the Introduction to Data Science course given by Coursera. Paradoxically, I didn’t finish it although I think it was the best of the three online data-related courses that I started this year (the others were Data Analysis on Coursera and Introduction to Data Science at Syracuse University). I think my non-finishing was related to some sort of MOOC fatigue and just the fact that I had too much going on. Also, the descriptions of the last mandatory assignments (Tableau and Kaggle) were a bit too vague for my schedule (I finished all the quizzes and the programming assignments). Enough excuses – I think this was an excellent course which covered a lot of ground, from SQL via Map/Reduce to machine learning. In particular, the Map/Reduce programming assignments (which used a Python Map/Reduce library) were helpful to me. Highly recommended (and yes, I’ll try to finish it up next year).

Online course experiences: Coursera Data Analysis and Syracuse U. Data Science

I’ve been following two online data analysis related courses during the past few months: the Data Analysis course given by Johns Hopkins U. through Coursera and the Introduction to Data Science course given by Syracuse U.  through Coursesites.

The Data Analysis course is the third one that I have enrolled in on Coursera, and the first one where I have completed all the coursework (I received my statement of accomplishment this past weekend, yippee!). Of the two previous courses I had enrolled in, I had tried to follow one but given up because of problems with the platform incorrectly grading the quizzes – a childish thing to quit a course over, because it’s the things you learn that should matter, but I felt that the weird grading made me uncertain about what parts of the material I had really understood.

I think the Data Analysis course was quite good, because it focused not only on R and statistics (which is great) but also on more practical aspects of data analysis, like how you might organize your files and write up a good analysis report. It introduced me to things like R markdown and knitr, which I had heard about but not used until now. The course contents were also surprisingly up to date, with things like the medley R package being included in the video lectures. This package, which was developed by a Kaggle competitor for constructing ensemble models more easily, was first mentioned in January 2013 on a Kaggle forum and doesn’t yet exist as an R package, yet it was covered in the course with nice examples of how to run it!

There is a “post-mortem” podcast at Simply Statistics where Jeff Leek (the main instructor of the course) and Roger Peng discuss what went right and what went wrong.

The course videos are on YouTube and course lectures are available on GitHub; both videos and lectures are tagged by week. Some numbers on participation given by Jeff Leek:

There were approximately 102,000 students enrolled in the course, about 51,000 watched videos, 20,000 did quizzes, and 5,500 did/graded the data analysis assignments.

Personally, I would perhaps have liked the contents to be slightly more difficult (because I came in with a fair amount of subject knowledge) but on the other hand the given level of difficulty let me get away with spending 3-5h per week on average on the course, as advertised. I think many students used a lot more.

The other course that I participated in, Introduction to Data Science from Syracuse University, was similar to the Data Analysis course in a way, specifically, in that it used R the vehicle for introducing statistical concepts. However, this course was much more limited in scope and basically did not assume that the students had had any prior exposure to statistics or programming. I felt that this was a mismatch for me and in the end did not finish all of the coursework. I did read the accompanying textbook which, in parts, did a very good job of explaining the value of data analysis in real-world scenarios. I felt that the course would be most useful for people who are curious about “big data” and “data science” and want to dip their toes into it a little bit but not necessarily work with data analysis. Maybe this was the intention.

Foolhardy as I am, I plan to take another MOOC data science course beginning in May, namely Introduction to Data Science. I’ll report back here afterwards!


Post Navigation