The LifeGene project, which was recently started in Sweden, may in due time generate one of the most complex and interesting data sets ever. The project will study health, lifestyle and genetics (and much more) in the long term in a cohort of 500.000 (this is not a typo!) individuals. Participants will donate blood samples and be subjected to physical measurements (waist and hip circumference, blood pressure etc), but for a smaller subset of participants the study will really go deep, with global analysis of DNA, RNA, protein, metabolite and toxin levels, as well as epigenomics (simplifying a bit, this means genomic information that is not directly encoded in the DNA sequence). Two testing centres have opened during the fall – one in Stockholm and, more recently, one in Umeå.
Environmental factors will be examined too: “Exposures such as diet, physical activity, smoking, prenatal environment, infections, sleep-disorders, socioeconomic and psychosocial status, to name a few, will be assessed.” The data collection will be done through for instance mobile phones and the web, with sampling rates adjusted based on age and life events. The project consortium calls the approach e-epidemiology.
This might make each participant feel a bit like David Ewing Duncan, the man who decided to try as many genetic, medical and toxicological test on himself as he could, and wrote a book about it. Will they suffer from information overload from self-related data? For the statisticians involved, information overload is a certainty. It will be a tough – but interesting – task to collect, store and mine these data. But exactly this kind of project, which relates hereditary factors to environment and lifestyle and correlates these to outcomes (like disease states), is much needed.