Sergey Brin’s new science and IBM’s Jeopardy machine
Two good articles from the mainstream press.
Sergey Brin’s Search for a Parkinson’s Cure deals with the Google co-founders quest to minimize his high hereditary risk of getting Parkinson’s disease (which he found out through a test from 23andme, the company his wife founded) while simultaneously paving the way for a more rapid way to do science.
Brin is proposing to bypass centuries of scientific epistemology in favor of a more Googley kind of science. He wants to collect data first, then hypothesize, and then find the patterns that lead to answers. And he has the money and the algorithms to do it.
This idea about a less hypothesis-driven kind of science, based more on observing correlations and patterns, surfaces once in a while. A couple of years ago, Chris Anderson received a lot of criticism for describing what is more or less the same idea in The End of Theory. You can’t escape the need for some sort of theory or hypothesis, and when it comes to something like Parkinson we just don’t know enough about its physiology and biology yet. However, I think Brin is right in emphasizing the need to get data and knowledge about diseases to circulate more quickly and to try to milk the existing data sets for what they are worth. If nothing else, his frontal attack on Parkinson’s may lead to improved techniques for dealing with über-sized data sets.
Smarter Than You Think is about IBM’s new question-answering system Watson, which is apparently now good enough to be put in an actual Jeopardy competition on US national TV (scheduled to happen this fall). It’s a bit hard to believe, but I guess time will tell.
Most question-answering systems rely on a handful of algorithms, but Ferrucci decided this was why those systems do not work very well: no single algorithm can simulate the human ability to parse language and facts. Instead, Watson uses more than a hundred algorithms at the same time to analyze a question in different ways, generating hundreds of possible solutions. Another set of algorithms ranks these answers according to plausibility; for example, if dozens of algorithms working in different directions all arrive at the same answer, it’s more likely to be the right one.
IBM plans to sell Watson-like systems top corporate customers for sifting through huge document collections.