New “big DNA data” cloud and genome interpretation companies
It seems like cloud computing platforms for what I call “big DNA data” (mostly data derived from high-throughput-sequencing experiments) have really started to take off now. About a year ago, I blogged about companies based on these ideas, but in the past month or so I feel like I have been reading about new companies in this space every now and then. A related category of companies that has emerged is what I call “genome interpretation companies”; companies that want to help you to make sense of (e.g.) big sequence data sets to arrive at some more or less actionable medical information. The cloud infrastructure and genome interpretation parts of big DNA data analysis can’t be cleanly separated and many companies offer some combination of both, which makes sense – if you have already built up the infrastructure, you might as well provide some tools.
DNANexus was already mentioned in the blog post from over a year ago, and it’s the company in this space that has received the most ecstatic press coverage. It has built up an impressive set of services compared to last year, but the most interesting thing for me at this point is that they have promised to launch “DNAnexus X”, a “community-inspired collaborative and scalable data technology platform”, in the near future.
Among new players, I have looked briefly at the following (I try to classify them roughly into “infrastructure-oriented” or “interpretation-oriented” although some are a mix of the two; I also mention two other companies that don’t fit to either category):
GeneStack promises a “Genomic Operating System”, which will be launched sometime during this year and is described as follows:
Access well-curated genomics and transcriptomics public data from major repositories worldwide. Store and share securely NGS data sets with your colleagues. Run high-performance computations on public and proprietary data in the cloud. Develop and sell genomics apps.
Appistry seems to be a general high-performance analytics company, although with one of its specializations in life science (meaning in this case "high performance sequencing".) They seem to offer mainly infrastructure, including analytics pipelines, which I think probably don't extend into what I am calling "interpretation" in this blog post.
Seven Bridges Genomics offers a cloud platform with open-source tools for genomics for hospitals, smaller labs and other organizations that don't have their own computing infrastructure. They are also the first company I've seen that employs a zombie.
Bina Technologies offers an interesting "hybrid" approach to cloud genome analytics. Realizing that many customers are deterred by the long upload times to, for instance, Amazon EC2, they have something called the "Bina Box" that processes the raw sequence data locally, after which the pre-processed and compressed (and thus much smaller) version of the data is uploaded to the "Bina Cloud."
Personalis' tagline is "Founded by global leaders in human genome interpretation" and indeed, their team of founders would be very hard to beat. About a month ago, some details on the DNA variant detection engine used by the company were published. (The engine, called HugeSeq, is also freely available in an academic version which is not supposed to be quite as cutting-edge as the one used by Personalis.)
SolveBio is still in private beta but a recent rather visionary article by founder Kaganovich titled "The Cloud will Cure Cancer" talks about the birth of "Big Bio" and calculating correlations in the cloud for getting a handle on the molecular profiles of tumors, predicting drug targets and designing treatment regimens.
SVBio or Silicon Valley Biosystems is being very secretive so far but is said to offer "interpretive software for the human genome."
DNA Guide (visualization) - not to be confused with Swedish 23andme clone DNA-Guide - has a technical solution for visualizing and navigating personal DNA data on the web safely while adhering to privacy regulations. (see Slideshare show)
Metaome (concept/knowledge search?) has developed DistilBio, a semantic search and data integration platform with a dynamic interface for navigating among biological concepts. It's a bit hard to explain but kind of cool if you are into life science research. There is a demo on the site.
Since so much of this blog post has been about cloud computing and personal genomics, I should mention that Amazon has recently put up sequence data from the 1000 Genomes project in their cloud. There are instructions and a tutorial here for those that would like to play around with the data.
Also, on the topic of computing infrastructure for genomics, Chris Dagdigian's slides from Bio-IT World Expo 2012 are pretty interesting. Among other things, he is suggesting that uploading genomic data into the cloud is now becoming feasible (using Aspera software).