Follow the Data

A data driven blog

Archive for the tag “ibm”

Sergey Brin’s new science and IBM’s Jeopardy machine

Two good articles from the mainstream press.

Sergey Brin’s Search for a Parkinson’s Cure deals with the Google co-founders quest to minimize his high hereditary risk of getting Parkinson’s disease (which he found out through a test from 23andme, the company his wife founded) while simultaneously paving the way for a more rapid way to do science.

Brin is proposing to bypass centuries of scientific epistemology in favor of a more Googley kind of science. He wants to collect data first, then hypothesize, and then find the patterns that lead to answers. And he has the money and the algorithms to do it.

This idea about a less hypothesis-driven kind of science, based more on observing correlations and patterns, surfaces once in a while. A couple of years ago, Chris Anderson received a lot of criticism for describing what is more or less the same idea in The End of Theory. You can’t escape the need for some sort of theory or hypothesis, and when it comes to something like Parkinson we just don’t know enough about its physiology and biology yet. However, I think Brin is right in emphasizing the need to get data and knowledge about diseases to circulate more quickly and to try to milk the existing data sets for what they are worth. If nothing else, his frontal attack on Parkinson’s may lead to improved techniques for dealing with über-sized data sets.

Smarter Than You Think is about IBM’s new question-answering system Watson, which is apparently now good enough to be put in an actual Jeopardy competition on US national TV (scheduled to happen this fall). It’s a bit hard to believe, but I guess time will tell.

Most question-answering systems rely on a handful of algorithms, but Ferrucci decided this was why those systems do not work very well: no single algorithm can simulate the human ability to parse language and facts. Instead, Watson uses more than a hundred algorithms at the same time to analyze a question in different ways, generating hundreds of possible solutions. Another set of algorithms ranks these answers according to plausibility; for example, if dozens of algorithms working in different directions all arrive at the same answer, it’s more likely to be the right one.

IBM plans to sell Watson-like systems top corporate customers for sifting through huge document collections.


Stream computing for babies

A Smarter Planet has a nice video about how IBM have used stream computing (basically meaning, I think, real-time analysis of massive streams of unstructured data) to improve the detection of life-threatening complications in prematurely born babies. Doctors at the The Hospital for Sick Children in Toronto wanted to try to use real-time information to detect changes in the condition of critically ill “preemies”. They set up a system to measure streams of physiological data about e g respiration and heart rate and analyze them on the fly. In a cute comparison, the speaker voice says that the IBM InfoSphere “…enables massive amounts of data to be correlated and analyzed for patterns and trend at more than 200 times a second, faster than a hummingbird flaps its wings.”

A very nice application of stream analytics – and as a bonus, the video uses Terry Riley’s A Rainbow in Curved Air as part of its soundtrack (I think).

A Smarter Planet

A Smarter Planet (with the slogan Instrumented, Interconnected, Intelligent) is an interesting group blog from IBM. It discusses issues around designing smarter cities and also has a strong focus on healthcare technology and analytics. Here is an excerpt that I found interesting from one of the healthcare analytics related posts:

In China, a first-of-a-kind system built initially by IBM’s China Research Lab, enables sharing of electronic medical records across traditional Chinese medicine and modern western medicine environments, allowing healthcare practitioners to more deeply understand which treatment plans and techniques from each environment work best for specific diseases and medical conditions.

Speaking of healthcare data analytics, there is an interesting discussion going on in the comments section to this post from Marginal Revolution. The post itself calls for opening up data about how successful different hospitals are in treating different conditions. However, there are many facets to consider here, as the commenters point out.

Post Navigation