IMPROVER, a disease-related predictive analytics contest
As I have said before, I think scientific prediction competitions (a form of crowdsourced research) are an interesting way to attack problems in science. The recently launched IMPROVER Systems Biology Verification is such a competition, and it’s especially nice in that it asks a very general question: Is it possible to extract reliable gene expression signatures for common diseases? The diseases selected for this challenge are psoriasis, multiple sclerosis, chronic obstructive pulmonary disease (COPD), and lung cancer, and contestants are allowed to use any public data to construct their predictors. We often read scientific publications with supposed gene expression signatures for various diseases, but a competition framework will better allow us to assess how sensitive and specific those signatures really are.
I see a few problems with the competition (although I should stress that I think it’s a very good initiative – we should have more of these!): (1) the competitors are obliged to submit entries for all four diseases (actually five classifiers are required as the MS challenge is divided into two parts) to be eligible for the prize, which is very tough to manage as each problem is likely to be extremely difficult and the deadline is May 30, 2012 (of course, it may be possible to run the same model on all diseases, but somehow I doubt that will be very successful); (2) I suspect that the open-ended approach allowing all public data to be used will lead to less successful models than in the typically tightly-defined Kaggle competitions; (3) there is too little time to disseminate information about the competition so that people have time to build something that works before 30/5. I am hoping to be wrong about point (2); it would be great if this competition could lead to some insights about how to best leverage diverse data from places like the Gene Expression Omnibus and ArrayExpress.
In view of my points (1)-(3), I predict that not many teams will submit predictions, which of course implies that it would be a good idea for anyone who reads this to participate – you will have a shot at the $50,000 prize (which by the way has to be used for research.)