Follow the Data

A data driven blog

Taxi driving and data mining

Tim O’Reilly’s and John Battelles’s Web Squared essay from a couple of months back relates an interesting anecdote:

Radar blogger Nat Torkington tells the story of a taxi driver he met in Wellington, NZ, who kept logs of six weeks of pickups (GPS, weather, passenger, and three other variables), fed them into his computer, and did some analysis to figure out where he should be at any given point in the day to maximize his take. As a result, he’s making a very nice living with much less work than other taxi drivers. Instrumenting the world pays off.

I think this kind of thing could be applied in many different professions. It would be interesting to know how well-versed the taxi driver was in statistics. If he wasn’t statistically trained, he presumably used a simple common-sense model, the success of which suggests that large gains can be had simply by quantifying what you do and picking up the major trends. Of course, he may have been a real data-analysis ninja. Either way, it’s probably fair to say, as O’Reilly and Battelle do in their article, that “Data analysis, visualization, and other techniques for seeing patterns in data are going to be an increasingly valuable skillset. Employers take notice.” Experienced taxi drivers  have probably built up an equally effective implicit model of how to get the most income, but the Wellington taxi driver may have been able to “skip ahead” a couple of years using his statistics.

Another thought that occurred to me is how one would go about building a generic web-based tool where people can track everyday data with a view towards prediction. It would likely be a combination of something like your.flowingdata for the tracking and predict.i2pi for the simple, no-fuss prediction part. Maybe such an application already exists?

The user would of course still have to put some work into defining the problem properly, like deciding what to record and how to encode it. For instance, the taxi driver mentioned above would have had to think about whether to record his location in terms of, for example, neighbourhoods, streets or exact GPS location (or all three) – each likely giving rise to its own advantages and drawbacks.

A really useful general tracking/prediction tool would probably also need some sort of automatic model optimization and validation framework (e.g. built-in variable selection and cross-validation cycles), which would be mostly kept out of the user’s view (unless the user explicitly wants to see it).

About these ads

Single Post Navigation

2 thoughts on “Taxi driving and data mining

  1. this will be religious: The success of the cabbie, I think, came from his that he could combine his experience (very “uncommon sense”) with eyeballing regression (or some other simple statistics). I don’t think any general machine-learning ninjutsu could ever beat that. No free lunch, and so on. Same thing with combining your.flowingdata and predict.i2pi, that could work, but not by just datamining our whole-situationome . . Prove me wrong please :)

  2. Mikael Huss on said:

    Well, I also suspect (as I hinted at in the post, I think) that what the cab driver did was something fairly simple. The point, which was maybe lost, was that even collecting simple information can be useful, if you do it systematically. I also agree that the experience of the cab driver that you refer to would help; in my scenario it would for instane be used for understanding what variables that need to be tracked and at what level.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 96 other followers

%d bloggers like this: