Follow the Data

A data driven blog

Archive for the tag “crowd-science”

Three angles on crowd science

Some recently announced news that illuminate crowd science, advancing science by somehow leveraging a community, from three different angles.

  • The Harvard Clinical and Translational Science Center (or Harvard Catalyst) has “launched a pilot service through which researchers at the university can submit computational problems in areas such as genomics, proteomics, radiology, pathology, and epidemiology” via the TopCoder online competitive community for software development and digital creation. One recently started Harvard Catalyst challenge is called FitnessEstimator. The aim of the project is to “use next-generation sequencing data to determine the abundance of specific DNA sequences at multiple time points in order to determine the fitness of specific sequences in the presence of selective pressure. As an example, the project abstract notes that such an approach might be used to measure how certain bacterial sequences become enriched or depleted in the presence of antibiotics.” (the quotes are from a GenomeWeb article that is behind a paywall) I think it’s very interesting to use online software development contests for scientific purposes, as a very useful complement to Kaggle competitions, where the focus is more on data analysis. Sometimes, really good code is important too!
  • This press release describes the idea of connectomics (which is very big in neuroscience circles now) and how the connectomics researcher Sebastian Seung and colleagues have developed a new online game, EyeWire, where players trace neural branches “through images of mouse brain scans by playing a simple online game, helping the computer to color a neuron as if the images were part of a three-dimensional coloring book.” The images are actual data from the lab of professor Winfried Denk. “Humans collectively spend 600 years each day playing Angry Birds. We harness this love of gaming for connectome analysis,” says Prof. Seung in the press release. (For similar online games that benefit research, see e.g. Phylo, FoldIt and EteRNA.)
  • Wisdom of Crowds for Robust Gene Network Inference is a newly published paper in Nature Methods, where the authors looked at a kind of community ensemble prediction method. Let’s back-track a bit. The Dialogue on Reverse Engineering Assessment and Methods (DREAM) initiative is a yearly challenge where contestants try to reverse engineer various kinds of biological networks and/or predict the output of some or all nodes in the network under various conditions. (If it sounds too abstract, go to the link above and check out what the actual challenges have been like.) The DREAM initiative is a nice way to check the performance of the currently touted methods in an unbiased way. In the Nature Methods paper, the authors show that “no single inference method performs optimally across all data sets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse data sets” and that “Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks.” So, in a very wisdom-of-crowds manner (as indeed the paper title suggests), it’s better to combine the predictions of all the contestants than just use the best ones. It’s like taking a composite prediction of all Kaggle competitors in a certain contest and observing that this composite prediction was superior to all individual teams’ predictions. I’m sure Kaggle has already done this kind of experiment, does anyone know?

Post Navigation