Via Bradford Cross’ excellent post on data-driven startups (he has one himself – FlightCaster, a flight-delay prediction service that I mentioned last year), I learned the interesting fact that there is now at least one venture capital company that specializes exclusively in data-driven or “big data” startups. This company is IA Ventures, and it “invests in companies that create tools to manage and extract value from massive, occasionally unstructured, often real-time data sets“. I particularly like this sentence from their web page: “Most data generated today is simply treated as exhaust—lost forever along with the valuable insights held in it.” This is very true, and there are sure to be enormous opportunities for those who are clever enough to turn this “exhaust” – in the form of structured or unstructured data – into a product. The above-mentioned post by Bradford Cross tries to suggest some public data sets that might be leveraged by a savvy startup.
One nice example of a company that uses seemingly mundane information – cab pickup frequencies in New York City – to create a useful product is Sense Networks. They perform “some heavy-duty data crunching” on information from taxi companies and mobile phone records to predict the best places to get a cab in NYC. The predictor is implemented as an iPhone application called CabSense. In a recent podcast named Reality Mining for Companies, Alex “Sandy” Pentland, a professor who is also on Sense Networks’ management team, describes how even more trivial information like movement patterns of individuals inside a company can actually be analyzed to improve productivity and working conditions. Did you know that productivity goes up 10% if you have coffee with a cohesive group of co-workers?
Anyway, it will be interesting to see how the data-driven startups funded by IA Ventures turn out. One of them, Recorded Futures, has also recently received funding from Google. This company is based in US and Sweden, and one of the people behind it is Christopher Ahlberg, the founder of Spotfire (a successful analytics company which was built around a user-friendly visualization tool and sold to Tibco a couple of years ago). Recorded Futures attempts to predict future events (!) by analyzing and indexing various sources (news, analysis pieces, prognoses etc.) on the web. I assume they use some sort of natural language processing to recognize entities (like names of people and companies, dates etc.) and infer relationships between them from indexed reports. The company’s blog has some interesting visualizations that summarize, for example, the lives of some terrorist suspects who have recently been in the news. My favorite entry in the blog (if only for its name) is “Has Hu Jintao’s behavior changed?” These blog case studies do not contain predictions of future events, but rather a kind of proof of concept that the system can reconstruct a reasonable timeline showing important events in a person’s (or maybe a company’s) life and display it in an effective way. I did register for a couple of “Futures“, an email based service where you get alerts about possible future events connected to a set of keywords, but the only prediction I have received so far was apparently based on some faulty date recognition.
In case you read Swedish (or are able to tolerate Google translations), the best summary I have found of what is currently known about Recorded Futures is at the Cornucopia blog.