Follow the Data

A data driven blog

Machine learning competitions and algorithm comparisons

Tomorrow, 29 May 2010, a lot of (European) people will be watching the Eurovision Song Contest to see which country will take home the prize. Personally, I don’t really care about who wins the contest itself, but I do care (somewhat) about which predictor will win the Eurovision Voting Forecast competition arranged by kaggle.com. Kaggle describes itself as “a platform for data mining, bioinformatics and forecasting competitions“. It provides an objective framework for comparing techniques and “allows organizations to have their data scrutinized by the world’s best statisticians.”

Contests like this are fun, but they can also have more serious aims. For instance, Kaggle also hosts a competition about predicting HIV progression based on the virus’ DNA sequence. The currently leading submission has already improved on the best methods reported in literature, and so a post at Kaggle’s No Free Hunch blog asks whether competitions might be the future of research. I think they may well be, at least in some domains. A few months back, I mentioned an interesting challenge at Innocentive which is essentially a very difficult pure research problem, and it will be interesting to learn how the winning team there did it (if any details are disclosed). (I signed up for this competition myself, but haven’t been able to devote more than one or two hours to it so far, unfortunately.)

There are other platforms for prediction competitions as well, for instance TunedITs challenge platform, which allows university teachers to “make their courses more attractive and valuable, through organization of on-line student competitions instead of traditional assignments.” TunedIT also has a research platform where you can run automated tests on machine learning algorithms and get reproducible experimental results. You can also benchmark results against a knowledge base or contribute to and use a repository of various data sets and algorithms.

Another initiative for serious evaluation of machine learning algorithms in various problem domains is MLcomp. Here, you can upload your own datasets (or use pre-loaded ones) and run existing algorithms on them through a web interface. MLcomp then reports various metrics that allow you to compare different methods.

By the way, 22 teams participated in Kaggle’s Eurovision challenge, and Azerbaijan is the clear favorite, having been picked as the winner by 14 teams. Let’s see how it goes tomorrow.

Advertisements

Single Post Navigation

One thought on “Machine learning competitions and algorithm comparisons

  1. Pingback: Tweets that mention Machine learning competitions and algorithm comparisons « Follow the Data -- Topsy.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: