Follow the Data

A data driven blog

Archive for the tag “personalized-medicine”

Sage commons and personalized medicine

Today I had the chance to talk to Stephen Friend, who started Sage Bionetworks, which I must have blogged about at some point but can’t find any entries for at the moment.

Sage was created to facilitate research through opening up genetic, clinical and other sorts of data as much as possible, or as the web site puts it, to address “the acute need for a new approach to using complex genetic information for drug development.” Stephen Friend has previously been at Rosetta and Merck and among the current data sets in the Sage Repository, there are several interesting ones, containing both genetic and phenotypic information, that have been used in high-profile Merck/Rosetta-related papers by Eric Schadt and others.

However, the Sage project is not only about providing data; it’s also about disease network modeling (I’m guessing both on the gene and protein levels), a goal that Friend is clearly serious about. Another interesting thing is that Sage has Jeff Hammerbacher – the whiz-kid who built up Facebook’s IT platform – on its board of directors. And he’s not there as a token big data guy – Friend told me that Hammerbacher is actively involved in developing Sage’s IT infrastructure. I think it’s great to have data scientists working on biological problems. As Hilary Mason said on the Strata conference, we have enough ad optimization solutions now – let’s do something different!

I hadn’t looked at Sage for a while and was pleasantly surprised to learn they have a nice Tumblr site which linked to interesting content such as this Scientific American Pathways compilation of good articles about data, medicine and personalization and the RNA game EteRNA, which is something on the lines of the Phylo and FoldIt games previously covered on this blog. I also found the Personalized Health Manifesto, written by David Ewing Duncan and underwritten by people like Stephen Friend, George Church, Eric Schadt, Atul Butte, Misha Angrist and many others. I haven’t read it yet but certainly aim to do so as soon as possible.

TR personalized medicine briefing

MIT’s Technology Review magazine has a briefing on personalized medicine. It’s worth a look, although it’s quite heavily tilted towards DNA sequencing technology (which I am interested in, but there is a lot more to personalized medicine). Not surprisingly, one of the articles in the briefing makes the point that the biggest bottleneck in personalized medicine will be data analysis, the risk being that “…we will end up with a collection of data … unable to predict anything.” (As an aside, I would be moderately wealthy if I had a euro for each time I’d read the phrase “drowning in data”, which appears in the article heading. I think I even rejected that as a name for this blog. It would be nice to see someone come up with a fresh alternative verb to “drowning” …)

Technology Review also has a piece on how IBM has started to put their mathematicians to work in business analytics. They mention a neat technique I hadn’t been aware of: “…they used a technique called high-quantile modeling–which tries to predict, say, the 90th percentile of a distribution rather than the mean–to estimate potential spending by each customer and calculate how much of that demand IBM could fulfill“.

The last part of the article talks about a very interesting problem: how to model a system where output from the model itself affects the system, or as the article puts it “…situations where a model must incorporate behavioral changes that the model itself has inspired“. I’m surprised the article doesn’t mention the obvious applicability of this to the stock market, where of course thousands of professional and amateur data miners use prediction models (their own and others’) to determine how they buy and sell stocks. Instead, its example comes from traffic control:

For example, […] a traffic congestion system might use messages sent to GPS units to direct drivers away from the site of a highway accident. But the model would also have to calculate how many people would take its advice, lest it end up creating a new traffic jam on an alternate route.

Video time

Here are a few video clips I’ve enjoyed watching over the past week.

From TEDMED2009, David Agus talks about cancer research and covers quite a lot of territory, from the value of monitoring your habits (he briefly discusses his own Philips DirectLife device) to the need for a molecular rather than tissue-based definition of cancer and his quest to model cancer as a complex system that has to do with a lot more than genetics.

The Argument for Better Health, in 3 Minutes & 53 Seconds is an attempt to summarize the most important arguments of Thomas Goetz’ new book The Decision Tree for a broader audience. In other words, it’s about how individuals can take control of their own health by using what Goetz calls a decision tree approach. The video, although good, is kind of entry-level material; if you want to go a bit deeper, you could download podcasts of the introduction and first chapter of the book. Here are three good reviews of the book.

Finally, a video in four parts explaining the benefits of using the R language for statistical analysis. I use R myself practically daily and think it’s great. These videos make it clear that it has now spread far outside of academia and has become an important part of the data analyst’s toolbox.

Link roundup

Here are some interesting links from the past few weeks (or in some cases, months). I’m toying with the idea of just tweeting most of the links I find in the future and reserving the blog for more in-depth ruminations. We’ll see how it turns out. Anyway … here are some links!

Open Data

The collaborative filtering news site Reddit has introduced a new Open Data category.

Following the example of New York and San Francisco (among others), London will launch an open data platform, the London Data Store.

Personal informatics and medicine

Quantified Self has a growing (and open/editable) list of self-tracking and related resources. Notable among those is Personal Informatics, which itself tracks a number of resources – I like the term personal informatics and the site looks slick.

Nicholas Felton’s Annual Report 2009. “Each day in 2009, I asked every person with whom I had a meaningful encounter to submit a record of this meeting through an online survey. These reports form the heart of the 2009 Annual Report.” Amazing guy.

What can I do with my personal genome? A slide show by LaBlogga of Broader Perspectives.

David Ewing Duncan, “the experimental man“, has read Francis Collins’ new book about the future of personalized medicine (Language of Life: DNA and the Revolution in Personalized Medicine­) and written a rather lukewarm review about it.

Duncan himself is involved in a very cool experiment (again) – the company Cellular Dynamics International has promised to grow him some personalized heart cells. Say what? Well, basically, they are going to take blood cells from him, “re-program” them back to stem-cell like cells (induced pluripotent cells), and make those differentiate into heart cells. These will of course be a perfect genetic match for him.

Duncan has also put information about his SNPs (single-nucleotide polymorphisms; basically variable DNA positions that  differ from person to person) online for anyone to view, and promises to make 2010 the year when he tries to make sense of all the data, including SNP information, that he obtained about his body when he was writing his book Experimental Man. As he puts it, “Producing huge piles of DNA for less money is exciting, but it’s time to move to the next step: to discover what all of this means.”

HolGenTech – a smartphone based system for scanning barcodes of products and matching them to your genome (!) – that is, it can tell you to avoid some products if you have had a genome scan which found you have a genetic predisposition to react badly to certain substances. I don’t think that the marketing video done in a very responsible way (it says that the system: “makes all the optimal choices for your health and well being every time you shop for your genome“, but this is simply not true – we know too little about genomic risk factors to be able to make any kind of “optimal” choices), but I had to mention it.

The genome they use in the above presentation belongs to the journalist Boonsri Dickinson. Here are some interviews she recently did with Esther Dyson and Leroy Hood, on personalized medicine and systems biology, respectively, at the Personalized Medicine World Conference in January.

Online calculators for cancer outcome and general lifestyle advice. These are very much in the spirit of The Decision Tree blog, through which I in fact found these calculators.

Data mining

Microsoft has patented a system for “Personal Data Mining”. It is pretty heavy reading and I know too little about patents to able to tell how much this would actually prevent anyone from doing various types of recommendation systems and personal data mining tools in the future; probably not to any significant extent?

OKCupid has a fun analysis about various characteristics of profile pictures and how they correlate to online dating success. They mined over 7000 user profiles and associated images. Of course there are numerous caveats in the data interpretation and these are discussed in the comments; still good fun.

A microgaming network has tried to curb data mining of their poker data. Among other things, bulk downloading of hand histories will be made impossible.

Link roundup

Gearing up into Christmas mode, so no proper write-up for these (interesting) links.

Personalized medicine is about data, not (just) drugs. Written by Thomas Goetz of The Decision Tree for Huffington Post. The Decision tree also has a nice post about why self-tracking isn’t just for geeks.

A Billion Little Experiments (PDF link). An eloquent essay/report about “good” and “bad” patients and doctors, compliance, and access to your own health data.

Latent Semantic Indexing worked well for NetFlix, but not for dating. MIT Technology Review writes about how the algorithms used to match people at Match.com (based on latent semantic indexing / SVD) are close to worthless. A bit lightweight, but a fun read.

A podcast about data mining in the mobile world. Featuring Deborah Estrin and Tom Mitchell.  Mitchell just recently wrote an article in Science about how data mining is changing: Mining Our Reality (subscription needed). The take-home message (or one of them) is that data mining is becoming much more real-time oriented. Data are increasingly being analyzed on the fly and used to make quick decisions.

How Zeo, the sleep optimizer, actually works. I mentioned Zeo in a blog post in August.

Individualized cancer research

I have been intrigued for some time by Jay Tenenbaum‘s idea to forget about clinical cancer trials and focus on deep DNA and RNA (and perhaps protein) profiling of individual patients in order to optimize a treatment especially for the given patient. (See e.g. this earlier blog post about his company, CollabRx.)

Tenenbaum and Leroy Hood of the Institute for Systems Biology recently wrote about their ideas in an editorial called A Smarter War on Cancer:

One alternative to this conventional approach would be to treat a small number of highly motivated cancer patients as individual experiments, in scientific parlance an “N of 1.” Vast amounts of data could be analyzed from each patient’s tumor to predict which proteins are the most effective targets to destroy the cancer. Each patient would then receive a drug regimen specifically tailored for their tumor. The lack of “control patients” would require that each patient serve as his or her own control, using single subject research designs to track the tumor’s molecular response to treatment through repeated biopsies, a requirement that may eventually be replaced by sampling blood.

This sounds cool, but my gut feeling has been that it’s probably not a realistic concept yet. However, I came across a blogged conference report that suggests there may be some value in this approach already. MassGenomics writes about researchers in Canada who decided to try to help an 80-year-old patient with a rare type of tumor (an adenocarcinoma of the tongue). This tumor was surgically removed but metastasized to the lungs and did not respond to the prescribed drug. The researchers then sequenced the genome (DNA) and transcriptome ([messenger] RNA) of the tumor and a non-tumor control sample. They found four mutations that had occurred in the tumor, and also identified a gene that had been amplified in the tumor and against which there happened to be a drug available in the drug bank. Upon treatment with this drug, all metastases vanished – but unfortunately came back in a resistant form several months later. Still, it is encouraging to see that this type of genome studies can be used to delay the spread of tumors, even if just for a couple of months.

A while back, MIT Technology Review wrote about a microfluidic chip which is being used in a clinical trial for prostate cancer. This chip from Fluidigm is meant to analyze gene expression patterns in rare tumor cells captured from blood samples. It is hoped that the expression signatures will be predictive of how different patients respond to different medications. Another microfluidic device from Nanosphere has been approved by the U.S. Food and Drug Administration to be used to “…detect genetic variations in blood that modulate the effectiveness of some drugs.” This would take pharmacogenomics – the use of genome information to predict how individuals will respond to drugs – into the doctor’s office.

“You could have a version of our system in a molecular diagnostics lab running genetic assays, like those for cystic fibrosis and warfarin, or in a microbiology lab running virus assays, or in a stat lab for ER running tests, like the cardiac troponin test, a biomarker to diagnose heart attack, and pharmacogenomic testing for [Plavix metabolism],” says [Nanosphere CEO] Moffitt.

Update 10 Dec:

(a) Rick Anderson commented on this post and pointed to Exicon, a company that offers, among other things, personalized cancer diagnostics based on micro-RNA biomarkers.

(b) Via H+ magazine,  I learned about the Pink Army Cooperative, who do “open source personal drug development for breast cancer.” They want to use synthetic biology to make “N=1 medicines”, that is, drugs developed for one person only. They “…design our drugs computationally using public scientific knowledge and diagnostic data collected from the individual to be treated.”

The Journal of Participatory Medicine calling all data geeks

The first, “launch” issue of the new Journal of Participatory Medicine was published today. The articles in it are mainly about “defining the territory”, as one subheading puts it, and about explaining how participatory medicine is important from the viewpoints of different stakeholders.

Esther Dyson has contributed an article (free registration may be required; I first read the article without having registered but had to register on my second visit) where she stresses the importance of data management and analysis:

So, here you have it — a call to inspired scientists, and clinician researchers, data-collectors, and data analysts: Help us understand the evidence that is already out there, but hidden under the virtual mattresses of paper files, incompatible formats, incomplete records, impenetrable bureaucracies.

The journal will be peer-reviewed and freely available on the web. Articles will be published continuously as they are accepted after review. In the launch announcement, the editors stress the importance of broad and accurate peer review, quoting recent complaints that “It is simply no longer possible to believe much of the clinical research that is published, or to rely on the judgment of trusted physicians or authoritative medical guidelines” from a former editor of The New England Journal of Medicine and “most of what appears in peer reviewed journals is scientifically weak” from a former editor of the British Medical Journal.

I recently learned that an online journal about a similar (but different) field, personalized medicine, has been around for five years already. (There may be others too.) The journal is called Personalized Medicine.

Body computing, preventive, predictive and social medicine

There have been many interesting articles and blog posts about the future of medicine, and specifically about the need to automatically monitor various physiological parameters, and, importantly, to start focusing more on health rather than disease; prevention rather than curing. The latter point has been stressed by Adam Bosworth, the former head of Google Health, in interviews like this one (audio) and this one (video, “The Body 2.0”). Bosworth is one of the founders of a company, Keas, that wants to help people understand their health data, set health goals and pursue them. He has a new blog post where he talks about machine learning in the context of health care. He (probably rightly) sees health care as lagging behind in adoption of predictive analytics. But he thinks this will change:

All the systems emerging to help consumers get personalized advice and information about their health are going to be incredible treasure troves of data about what works. And this will be a virtuous cycle. As the systems learn, they will encourage consumers to increasingly flow data into them for better more personalized advice and encourage physicians to do the same and then this data will help these systems to learn even more rapidly. I predict now that within a decade, no practicing physician will consider treating their patients without the support/advice of the expertise embodied in the machine learning that will have taken place. And finally, we will truly move to an evidence based health care system.

Along similar lines, the Broader Perspective blog writes about the “three tiers of medicine” that may make up the future healthcare system. The first tier consists of automated health monitoring tools that collect information about your health, The second tier is about preventive medicine and involves “health coaches”, who “…incorporate genomic data, together with family history and current phenotype and biomarker data into an overall care plan“. Finally, the third tier is the traditional health care system of today (hospitals, doctors, nurses).

I learned a new term for the enabling technology for the first (data-collection) tier: body computing. The Third Body Computing Conference will be hosted by the University of Southern California on Friday (9 October). The conference’s definition of body computing is that

“Body Computing” refers to an implanted wireless device, which can transmit up-to-the-second physiologic data to physicians, patients, and patients’ loved ones.

A new article about the future of health care in Fast Company also talks about body computing and predictive/preventive health care:

Wireless monitoring and communication devices are becoming a part of our everyday lives. Integrated into our daily activities, these devices unobtrusively collect information for us. For example, instead of doing an annual health checkup (i.e. cardiac risk assessment), near real-time health data access can be used to provide rolling assessments and alert patients of changes to their health risk based on biometrics assessment and monitoring (blood pressure, weight, sleep etc). With predictive health analytics, health information intelligence, and data visualization, major risks or abnormalities can be detected and sent to the doctor, possibly preempting complications such as stroke, heart attack, or kidney disease.

Although the article is named The Future of Health Care Is Social, it actually talks mostly about self-tracking and predictive analytics. It does go into social aspects of future healthcare, like online health/disease-related networks such as PatientsLikeMe or CureTogether. All in all, a nice article.

And finally (if anyone is still awakw), it has been widely reported that IBM has joined the sequencing fray and are trying to develop a nanopore-based system, a “DNA transistor”, for cheap sequencing. There are now several players in this area (for example, Oxford Nanopore, Pacific Biosystems, NABSYS) and some of them are bound to lose out – time will tell who will emerge on top. Anyway, the reason I mentioned this is partly that IBM explicitly connected this announcement to healthcare reform and personalized healthcare (IBM CEO also wants to resequence the health-care system) and partly because of the surprising (to me) fact that “[…] IBM also manages the entire health system for Denmark.” Really?

By the way, a good way to get updates on body computing is to follow Dr Leslie Saxon on Twitter.

Personal transcriptomics?

MIT’s Technology Review has an interesting blog post about Hugh Rienhoff, a clinical geneticist and entrepreneur, who is trying to apply personal genomics transcriptomics to find the causes of his daughter Beatrice’s unusual, Marfan’s syndrome-like symptoms. The blog post describes how Illumina (a leading company in DNA sequencing) has sequenced parts of the genomes of Rienhoff, his wife and his daughter, and how he has now spent about a year searching through these genome sequences for mutations that only Beatrice has.

In fact, looking at another blog post, it seems like they are actually sequencing RNA (mRNA, to be specific) rather than genomic DNA. This makes a lot of sense, because RNA sequencing (RNA-seq) gives information about genes that are actually being expressed – transcribed into mRNA and then presumably translated to proteins. This sort of “transcriptome profiling” should potentially be able to give a lot of information about disease states beyond what can be gleaned from a genome scan (although those are, of course, informative as well.)

From the sequencing data, Rienhoff has compiled a list of about 80 genes that are “less active” in Beatrice than in her parents. (I wonder what tissues or cell types they assayed?) According to the Nature blog post, Illumina will be doing similar transcriptome profiling on up to nine family trios (mum, dad, child) where the child has, for instance, autism or Loyes Dietz syndrome.

A quote from the Technology Review blog post:

One of the biggest challenges, Rienhoff says, is the software available to analyze the data. “To ask the questions I want to ask would take an army,” he says. “I’m trying to connect the dots between being a genomicist and a clinical geneticist. I don’t think anyone here realizes how difficult that is. I’m willing to take it on because it matters to me.”

Reading about this sort of literally personal genomics/medicine made me think of Jay Tenenbaum and his company CollabRx, which offers a “Personalized Oncology Research Program”, where they “…use state-of-the-art molecular and computational methods to profile the tumor and to identify potential treatments among thousands of approved and investigational drugs” So the approach here is presumably also to do some sort of individual-based transcriptional profiling, but this time on tumor material. After all, cancer is a heterogeneous disease (or a heterogeneous set of diseases) and tumors probably vary widely between patients. Echoing Rienhoff above, Tenenbaum said in an interesting interview a couple of months ago that biology is becoming an information science and that CollabRx are “heavily dependent on systems and computational biology” (=software, algorithms, data analysis, computing infrastructure).

I applaud the effforts of CollabRx, while simultaneously being sceptical about what can be achieved today using this approach in the way of clinical outcomes. But someone has to be the visionary and pave the way.

Personal genome glitch uncovered

As recounted in this New Scientist article and commented upon in Bio-IT World, journalist Peter Aldhous managed to uncover a bug in the deCODEme browser (Decode Genetics’ online tool for viewing parts of your own genome). deCODEme is one of a handful of services, including 23andme and Navigenics, that genotype small genetic variations called SNPs (snips; single-nucleotide polymorphisms) in DNA samples submitted by customers. The results are then used to calculate disease risks and other things, which are displayed to the customer in a personalized view of his or her genome.

Aldhous was comparing the output he got from two of these services – deCODEme and 23andme  – and discovered that they were sometimes very different. After patiently going to the bottom of the matter, he discovered that the reason for the discrepancy was that the deCODEme browser sometimes (but not always) displayed jumbled output for mitochondrial sequences. According to Bio-IT World, the bug seems to have been due to an inconsistency between 32-bit and 64-bit computing environments and has now been fixed.

Isn’t this a nice example of computational journalism, where a journalist is skilled or persistent enough to actually analyze the data that is being served up and detect inconsistencies?

I might as well sneak in another New Scientist article about personal genomes. This one urges you to make your genome public in the name of the public good. It mentions the Harvard Personal Genome Project, which aims to enroll 100,000 (!!) participants whose genomes will be sequenced. The first ten participants, some of which are pretty famous, have agreed to share their DNA sequence freely.

I have no idea whether the Personal Genome Project is related to the Coriell Personalized Medicine Collaborative which also wants to enroll 100,000 participants in a longitudinal study where the goal is to find out how much utility there is in using  personal genome information in health management and clinical decision-making

Post Navigation