## Online course experiences: Coursera Data Analysis and Syracuse U. Data Science

I’ve been following two online data analysis related courses during the past few months: the Data Analysis course given by Johns Hopkins U. through Coursera and the Introduction to Data Science course given by Syracuse U. through Coursesites.

The Data Analysis course is the third one that I have enrolled in on Coursera, and the first one where I have completed all the coursework (I received my statement of accomplishment this past weekend, yippee!). Of the two previous courses I had enrolled in, I had tried to follow one but given up because of problems with the platform incorrectly grading the quizzes – a childish thing to quit a course over, because it’s the things you learn that should matter, but I felt that the weird grading made me uncertain about what parts of the material I had really understood.

I think the Data Analysis course was quite good, because it focused not only on R and statistics (which is great) but also on more practical aspects of data analysis, like how you might organize your files and write up a good analysis report. It introduced me to things like R markdown and knitr, which I had heard about but not used until now. The course contents were also surprisingly up to date, with things like the *medley* R package being included in the video lectures. This package, which was developed by a Kaggle competitor for constructing ensemble models more easily, was first mentioned in January 2013 on a Kaggle forum and doesn’t yet exist as an R package, yet it was covered in the course with nice examples of how to run it!

There is a “post-mortem” podcast at Simply Statistics where Jeff Leek (the main instructor of the course) and Roger Peng discuss what went right and what went wrong.

The course videos are on YouTube and course lectures are available on GitHub; both videos and lectures are tagged by week. Some numbers on participation given by Jeff Leek:

There were approximately 102,000 students enrolled in the course, about 51,000 watched videos, 20,000 did quizzes, and 5,500 did/graded the data analysis assignments.

Personally, I would perhaps have liked the contents to be slightly more difficult (because I came in with a fair amount of subject knowledge) but on the other hand the given level of difficulty let me get away with spending 3-5h per week on average on the course, as advertised. I think many students used a lot more.

The other course that I participated in, Introduction to Data Science from Syracuse University, was similar to the Data Analysis course in a way, specifically, in that it used R the vehicle for introducing statistical concepts. However, this course was much more limited in scope and basically did not assume that the students had had any prior exposure to statistics or programming. I felt that this was a mismatch for me and in the end did not finish all of the coursework. I did read the accompanying textbook which, in parts, did a very good job of explaining the value of data analysis in real-world scenarios. I felt that the course would be most useful for people who are curious about “big data” and “data science” and want to dip their toes into it a little bit but not necessarily work with data analysis. Maybe this was the intention.

Foolhardy as I am, I plan to take another MOOC data science course beginning in May, namely Introduction to Data Science. I’ll report back here afterwards!

Are you taking the intro to data science course now? I’m new to python and it took me awhile to get through the first set of assignments.

Yes, I am, not sure I will finish it due to real-world interference though 🙂 I agree that the first set of assignments was pretty tough if you didn’t have previous exposure to Python, JSON etc. Since I use Python on a regular basis, I was able to do it without too much trouble but I thought it might be a bit of a mismatch for a lot of students – the data analysis course had a better pacing I guess.

