Some interesting competitions in data analysis / prediction:
Kaggle is managing this year’s KDD Cup, which will be about Weibo, China’s rough equivalent to Twitter (with more support for adding pictures and comments on posts, it’s more like a hybrid between Twitter and Facebook maybe). There will be two tasks, (1) predicting which users a certain user will follow (all data being anonymized, of course), and (2) predicting click-through rate in online computational ad systems. According to Gordon Sun, chief scientist at Tencent (the company behind Weibo), the data set to be used is the largest one ever to have been released for competitive purposes.
CrowdAnalytix, an India-based company with a business idea similar to Kaggle’s, has started a fun quickie competition about sentiment mining. Actually the competition might already be over as it ran for just 9 days starting 16/2. The input consists of comments left by visitors to a major airport in India, and the goal is to identify and compile actionable and/or interesting information, such as what kind of services visitors think are missing.
The Clarity challenge is, for me, easily the most interesting challenge of the three, in that it concerns the use of genomic information in healthcare. This challenge (with a prize sum of $25,000) is, in effect, crowdsourcing genomic/medical research (although only 20 teams will get selected to participate). The goal is to identify and report on potential genetic features underlying medical disorders in three children, given the genome sequences of the children and their parents. These genetics features are presently unknown, which is why this competition really represents something new in medical research. I think this is a very nice initiative, in fact I had thought of initiating something similar at my own institute where I work, but this challenge is much better than what I had in mind. It will be very interesting to see what comes out of it.