Follow the Data

A data driven blog

Archive for the tag “reality-mining”

Quick links

Data-driven venture capitalists and more

Via Bradford Cross’ excellent post on data-driven startups (he has one himself – FlightCaster, a flight-delay prediction service that I mentioned last year), I learned the interesting fact that there is now at least one venture capital company that specializes exclusively in data-driven or “big data” startups. This company is IA Ventures, and it “invests in companies that create tools to manage and extract value from massive, occasionally unstructured, often real-time data sets“. I particularly like this sentence from their web page: “Most data generated today is simply treated as exhaust—lost forever along with the valuable insights held in it.” This is very true, and there are sure to be enormous opportunities for those who are clever enough to turn this “exhaust” – in the form of structured or unstructured data – into a product. The above-mentioned post by Bradford Cross tries to suggest some public data sets that might be leveraged by a savvy startup.

One nice example of a company that uses seemingly mundane information – cab pickup frequencies in New York City – to create a useful product is Sense Networks. They perform “some heavy-duty data crunching” on information from taxi companies and mobile phone records to predict the best places to get a cab in NYC. The predictor is implemented as an iPhone application called CabSense.  In a recent podcast named Reality Mining for Companies, Alex “Sandy” Pentland, a professor who is also on Sense Networks’ management team, describes how even more trivial information like movement patterns of individuals inside a company can actually be analyzed to improve productivity and working conditions. Did you know that productivity goes up 10% if you have coffee with a cohesive group of co-workers?

Anyway, it will be interesting to see how the data-driven startups funded by IA Ventures turn out. One of them, Recorded Futures, has also recently received funding from Google. This company is based in US and Sweden, and one of the people behind it is Christopher Ahlberg, the founder of Spotfire (a successful analytics company which was built around a user-friendly visualization tool and sold to Tibco a couple of years ago). Recorded Futures attempts to predict future events (!) by analyzing and indexing various sources (news, analysis pieces, prognoses etc.) on the web. I assume they use some sort of natural language processing to recognize entities (like names of people and companies, dates etc.) and infer relationships between them from indexed reports. The company’s blog has some interesting visualizations that summarize, for example, the lives of some terrorist suspects who have recently been in the news. My favorite entry in the blog (if only for its name) is “Has Hu Jintao’s behavior changed?” These blog case studies do not contain predictions of future events, but rather a kind of proof of concept that the system can reconstruct a reasonable timeline showing important events in a person’s (or maybe a company’s) life and display it in an effective way. I did register for a couple of “Futures“, an email based service where you get alerts about possible future events connected to a set of keywords, but the only prediction I have received so far was apparently based on some faulty date recognition.

In case you read Swedish (or are able to tolerate Google translations), the best summary I have found of what is currently known about Recorded Futures is at the Cornucopia blog.

Link roundup

Gearing up into Christmas mode, so no proper write-up for these (interesting) links.

Personalized medicine is about data, not (just) drugs. Written by Thomas Goetz of The Decision Tree for Huffington Post. The Decision tree also has a nice post about why self-tracking isn’t just for geeks.

A Billion Little Experiments (PDF link). An eloquent essay/report about “good” and “bad” patients and doctors, compliance, and access to your own health data.

Latent Semantic Indexing worked well for NetFlix, but not for dating. MIT Technology Review writes about how the algorithms used to match people at (based on latent semantic indexing / SVD) are close to worthless. A bit lightweight, but a fun read.

A podcast about data mining in the mobile world. Featuring Deborah Estrin and Tom Mitchell.  Mitchell just recently wrote an article in Science about how data mining is changing: Mining Our Reality (subscription needed). The take-home message (or one of them) is that data mining is becoming much more real-time oriented. Data are increasingly being analyzed on the fly and used to make quick decisions.

How Zeo, the sleep optimizer, actually works. I mentioned Zeo in a blog post in August.

Quick links

Ran across a couple of interesting links:

Space-Time Travel Data is Analytic Super-Food!, a very meaty blog post where Jeff Jonas starts by discussing largely the same themes that I blogged about a while back, but he has thought more – a lot more! – deeply about it and delivers a number of interesting insights and predictions. The comments section contains some good stuff too.

Data is Journalism – this post discusses the acquisition by of the local data aggregator service Everyblock. The question of whether data “is” journalism reminds me of the world of science – is a big and hard-to-obtain data set worthy of being published in a prestigious journal, even if the accompanying paper lacks a clear advancement in scientific knowledge? These questions may not be correctly formulated, and when it comes to journalism, I’m certain that data analysis and presentation will play an important role in its future, along with the more traditional components.

Mobile phones, location and indexing the real world

Mobile phones are rapidly becoming powerful data acquisition devices, as described e. g. in recent (and good) articles in The Economist and Nature. Many phones have cameras, GPS systems and net connections, and some of them sport accelerometers, which can be used to measure the amount of calories burnt by the user, or even to track earthquakes.

A number of enterprising researchers have started to mine the location data that can be obtained from mobile phones (through information from mobile towers routing the communication). Last year, the complex-networks guru Albert-László Barabási and co-workers published a paper, Understanding individual human mobility patterns, where they studied movement trajectories of 100.000 (anonymized) mobile phone users. The result reported by the authors – that human movement is not random but shows high spatial and temporal regularity – was perhaps not as impressive as the sheer size of the data set.

For those who would like to try their hand at analyzing mobile phone data, MIT’s Reality Mining project provides an interesting and freely accessible data set. In this project, students carried (Nokia) phones and their trajectories were tracked. The subjects also answered various questions about themselves and their habits. The data gathered for the Reality Mining project included location information (again, through mobile towers), communication data (call records) and proximity data (using Bluetooth).

The researchers behind the project developed algorithm for extracting routine everyday patterns from user’s lives and claim they can predict their subjects’ next actions to a fairly good approximation.

The Economist article linked above quotes one of the MIT researchers, Alex Pentland, as saying that “… some handsets can capture information about individuals, such as their activity levels or even their gait, using built-in motion sensors.” This suggested to me that it might be possible to detect changes in gross motor patterns in an individual, such as those that have been shown to sometimes occur in depressed patients. Thus, a smart phone could be an “early warning system” for depression.

The Reality Mining group has spawned off a company, Sense Networks, that aims to bring location-based data to the commercial sphere in a big way. Their slogan is “Indexing the real world using location data for predictive analytics.”

Indexing the real world! Now that would be something.

Currently, Sense Networks offers a service, CitySense, for finding out where the action is in a city. I quote from the web site:

Citysense passively “senses” the most popular places based on actual real-time activity and displays a live heat map. The application intelligently leverages the inherent wisdom of crowds without any change in existing user behavior, in order to navigate people to the hottest spots in a city. […]

The application learns about where each user likes to spend time – and it processes the movements of other users with similar patterns. In its next release, Citysense will not only answer “where is everyone right now” but “where is everyone like me right now.” Four friends at dinner discussing where to go next will see four different live maps of hotspots and unexpected activity. Even if they’re having dinner in a city they’ve never visited before.

Post Navigation