BioFrontiers Symposium presentation

I just returned from Boulder, Colorado (lovely place!) where I was one of the speakers at the BioFrontiers Symposium on Big Data, Genomics and Molecular Networks. Here are the slides for the presentation, in the form that they were supposed to be in (in actual fact I ended up working from a slightly outdated version on stage.)

The other talks were all good. Some themes that came up a few times were the importance of collaboration, getting data out in the open from data silos, and improved software development standards. (Maybe I just remember those because I was talking about the first two of those myself.) Again, I appreciated all the talks but some nuggets that have stayed with me are Sean Eddy’s flashbacks to 1980’s sequence analysis, David Haussler’s talk about why we have a chance to beat cancer (“Cancer isn’t smart, it dies with the patient”; “A tumor genome is an evolving metagenome, a mixture of genomes of subclones”) and about how to set up a ~100 pb repository of cancer related sequence data, and Michael Snyder’s story of obsessively characterizing not only his own genome, proteome and transcriptome but also antibody-ome, microbiome, methylome, etc, to which he is now also adding sensors to measure sleep, number of steps per day etc.

EDIT: I have a question to my readers regarding the estimates for “Tb processed per day” and “pb stored” for different companies and organizations in the linked presentation. One of the big surprises was that Ebay claim to process so much data (100 pb per day), much more than Google (20 pb per day). My sources for the Ebay figure are this PDF and this interview; the number pops up in many other places. Is this because of the “query rewriting” mentioned in this blog post or?


