Sequencing data storm
Today, I attended a talk given by Wang Jun, a humorous and t-shirt-clad whiz kid who set up the bioinformatics arm of Beijing Genomics Institute (BGI) as a 23-year-old PhD student, became a professor at 27, and is now the director of BGI’s facility in Shenzhen, near Hong Kong. Although I work with bioinformatics at a genome institute myself, this presentation really drove home how much storage, computing power and know-how is really required for biology now and in the near future.
BGI does staggering amounts of genome sequencing – “If it tastes good, sequence it! If it is useful, sequence it!” as Wang Jun joked – from indigenous Chinese plants to rice, pandas and humans. They have a very interesting individual genome project where they basically apply many different techniques on samples from the same person and compare the results against known references. One of many interesting results from this project was the finding that human genomes not only vary in single “DNA letter” variants (so called SNPs, single nucleotide polymorphisms) or the number of times certain stretches of DNA are repeated (“copy number variations”) – it now turns out there are DNA snippets that, in largely binary fashion, some people have and some don’t.
Although the existing projects demand a lot of resources and manpower – the BGI has 250 bioinformaticians (!) which is still too few; according to Wang they want to quickly increase this number to 500 – this is nothing compared to what will happen after the next wave of sequencing technologies, when we will start to sequence single cells from different (or the same) tissues in an individual. Already, the data sets generated are so vast that they cannot be distributed over the internet. Wang recounted how he had to bring ten terabyte drives to Europe by himself in order to share his data with researchers at EBI (European Bioinformatics Institute). Now, they are trying out cloud computing as a way to avoid moving the data around.
Wang attributed a lot of BGI’s success to young, hardworking programmers and scientists – many of them university dropouts – who don’t have any preconceptions about science and therefore are prepared to try anything. “These are teenagers that publish in Nature,” said Jun, apparently feeling that he was (at 33) already over the hill. “They don’t run on a 24-hour cycle like us, they run on 36h-cycles and bring sleeping bags to the lab.”
All in all, good fun.