Follow the Data

A data driven blog

Archive for the tag “science”

Phylo – an alignment game

I’ve been playing some Phylo while snowed in during this weekend. This nifty game, developed by a group at McGill University in Canada, reminds me a lot of FoldIt, which I’ve mentioned several times on this blog. Like FoldIt, Phylo works well just as a logic/pattern-recognition game, but also has a hidden (well, actually not hidden at all) agenda; it tries to apply the strategies used by the (most skillful) players to actual scientific problems. In the case of Phylo, the problem that you are trying to solve is multiple sequence alignment, or described more simply, trying to match up DNA sequences from different species to each other. Multiple sequence alignment is one of the truly classic problems in bioinformatics, and there are many good algorithms for it, but these could still be improved. The idea of Phylo is to leverage human beings’ superior pattern recognition capabilities to solve really tricky multiple alignment problems. Related (or presumably related) DNA sequences from various organisms have already been matched up against each other (aligned) by an existing algorithm, and the idea is that human players may be able to further optimize the alignments “by eye”.

I think there are two things that are really cool about this game. The first thing is that the creators are actually picking the problems from a public resource, the UCSC Genome Browser, where they have located a number of poorly aligned stretches of DNA close to genes (stretches in so-called “promoter regions”). These are regions for which one might suspect that the best alignment hasn’t been found. Also, these regions are interesting from a disease perspective, and each task in Phylo has to do with a certain disease or type of disease.

The second thing that I like is the educational aspect of the game. I’ve studied alignment algorithms (a long time ago), and even though I knew about the scoring schemes on a theoretical level, I hadn’t really understood them in a tangible way before I played Phylo. It’s funny how a game with scores makes you motivated to understand how something works. If I was teaching on a bioinformatics course, I would not hesitate to have the students play Phylo in conjunction with the material on sequence alignment. Never mind the exam, just solve level 9 and you’ve passed the course!

The fourth paradigm

A new book about science in the age of big data, Fourth Paradigm: Data-Intensive Scientific Discovery, is available for downloading (for free). The book was reviewed in Nature today. It’s written by people from Microsoft Research and has a foreword by Gordon Bell, one of the authors of Total Recall: How the E-memory Revolution Will Change Everything.

Crowdsourcing dinosaur science

The recently initiated Open Dinosaur project is an excellent example of crowdsourcing in science. The people behind the project are enlisting volunteers to find skeletal measurements from dinosaurs in published articles and submit them into a common database. Or as they put it, “Essentially, we aim to construct a giant spreadsheet with as many measurements for ornithischian dinosaur limb bones as possible.” All contributors (anyone can participate) get to be co-authors on the paper that will be submitted at the end of the project.

One good thing about the project is that its originators have obviously taken pains to help the participants get going. They’ve put up comprehensive tutorials about dinosaur bone structure (!) and about how to locate relevant references and find the correct information in them.

As of yesterday, they had over 300 verified entries, after just ten days. It will be interesting to see other similar efforts in the future.

Post Navigation