Follow the Data

A data driven blog

Archive for the month “January, 2010”

Mathematical modeling and drug cartels

An interesting article about the mathematical sociology of drug cartels in the New York Times.

It may seem strange to examine this shadowy world with equations. But mathematics is transforming the social sciences. In the same way that physicists can predict the movements of atoms in space, we can use mathematics to model how individuals and groups will make decisions and interact in a society.

Crowdsourcing adverse drug event prediction algorithms

There’s an interesting competition, Observational Medical Outcomes Partnership Cup (OMOP Cup), going on until March this year (so unfortunately a bit late for laggards like me to participate). The background is that a lot of data on adverse drug events has recently become available, but much of this data is in free text and unstandardized formats. The development of algorithms for identifying patterns in adverse drug events, an by extension for predicting new events, has therefore been lagging. A good way to find and predict adverse drug events could save a lot of lives worldwide. The OMOP has therefore constructed a ”simulated” data set which resembles the kind of information you get from insurance claims and medical records. There are two algorithmic tasks, the first of which resembles classical data mining problems where you get an entire data set which you try to characterize as well as possible, and the second of which is more in a stream mining style where your algorithm is continuously evaluated by running it against observations that become sequentially available over time. The prizes are USD10.000 for the first task and USD5.000 for the second.

The OMOP Cup appears to be hosted by Orwik, a company which I was very vaguely aware of but hadn’t really looked at. Its product appears to be a data management solution for scientists, either individual researchers or groups of collaborators who want to keep their data available (for example, after key people have left) and perhaps sharable (for multi-group collaborations), all (I assume) with the aim of supporting well-documented and reproducible research.

Stream computing for babies

A Smarter Planet has a nice video about how IBM have used stream computing (basically meaning, I think, real-time analysis of massive streams of unstructured data) to improve the detection of life-threatening complications in prematurely born babies. Doctors at the The Hospital for Sick Children in Toronto wanted to try to use real-time information to detect changes in the condition of critically ill “preemies”. They set up a system to measure streams of physiological data about e g respiration and heart rate and analyze them on the fly. In a cute comparison, the speaker voice says that the IBM InfoSphere “…enables massive amounts of data to be correlated and analyzed for patterns and trend at more than 200 times a second, faster than a hummingbird flaps its wings.”

A very nice application of stream analytics – and as a bonus, the video uses Terry Riley’s A Rainbow in Curved Air as part of its soundtrack (I think).

Drone data overload

Two weeks ago, the New York Times published a really interesting article, Military Is Awash With Data From Drones. Apparently, US Air Force drones collected 24 years’ worth of video data in 2009, and “A group of young analysts already watches every second of the footage live as it is streamed to Langley Air Force Base [...] and to other intelligence centers …” Sounds pretty boring … Now, the Air Force is “turning to the television industry to learn how to quickly share video clips and display a mix of data in ways that make analysis faster and easier.”

They also plan to mine the collected video data later in order to find e g insurgency patterns, but the sheer amount of raw video is hard to deal with, especially when you have to have it watched by human eyes. Of course these processes are being made more and more automatic, but for the time being, as the article states:

“You need somebody who’s trained and is accountable in recognizing that that is a woman, that is a child and that is someone who’s carrying a weapon,” [...] “And the best tools for that are still the eyeball and the human brain.”

Food and politics

This is a couple of months old and a bit silly, but worth a mention, I think. The collaborative decision-making site hunch.com, which wants to take the pain out of making decisions by letting you ask a stranger (actually an aggregate of a whole lot of them) for advice, has published a report on correlations between political persuasion and food preferences.

Some background on hunch.com: You “teach” hunch.com about your personality by answering questions, so the resulting advice will be influenced by the choices of people who have  personality profiles similar to yours. When you are actually about to make a decision, the system asks you more and more  questions related to to specific choice you are facing, and weights its advice accordingly. You can also give feedback on the final recommendation and hopefully get even better advice in the future.

Of course, hunch.com collects a lot of information on different kinds of preferences as a “side effect” of all of this advice-giving. This info can be mined in order to discover surprising – or sometimes not so surprising – correlations. In the food/politics report mentioned above, self-declared liberals and conservatives were compared with respect to their favorite foods, cooking skills and so on. Some of the results:

- If you have both liberals and conservatives at your dinner table, it’s safest to serve hot dogs or double cheeseburger, as both groups like these. If you serve margaritas, do so with salt on the glass.

- Liberals like international food like Thai and Indian, while conservatives prefer things like pizza and Mac & Cheese.

- Conservatives have apple corers and know how to use them, but liberals don’t even know what they are.

Another report reveals that people who self-identify as Mac people like Andy Warhol while PC people don’t; that Mac users prefer Vespas but PC users prefer Harleys; and that Mac people like The Office more.

I guess all of these results sort of confirm stereotypes we already have, but it’s still good fun to read through these reports, which are periodically announced on the hunch.com blog. Who knows, perhaps one day a *useful* correlation pops out …

Computational advertising course

I’ve written about one company that exemplifies how advertising is becoming more data-driven, and now I find there is a Stanford university course about computational advertising. One of the lecture note PDFs defines computational advertising as “A principled way to find the ‘best match’ between a user in a context and a suitable ad“. Although I agree with this O’Reilly Radar blog post in thinking that it’s a stretch to call computational advertising a “scientific discipline”, the lecture notes are nevertheless fun and interesting to read. The instructors are from Yahoo! Research and probably a lot of the material that they cover is actually being used by Yahoo! in some way.

Post Navigation

Follow

Get every new post delivered to your Inbox.

Join 46 other followers