During the past few days, I’ve stumbled onto two different interpretations of the term “data engines.”
A recent post by Matthew Hurst described data engines as a “new category of online experience” which “represent the intermediary between the formal data being released by many networked organizations and researchers (and many other data sources besides) and a user base spanning data journalists, data geeks and an unwary public.” He’s talking about companies or platforms such as Infochimps, The Guardian Data Store and Data Market, the latter being an interesting-looking Icelandic company that I first learned about just a couple of days ago via the new Get the Data Q&A forum where people share ways to access public data sources. In his blog post, Matthew Hurst compares the user interfaces of these different “data engines” – how should time series (for instance) be displayed for maximum user convenience? Hurst has his own beta version of this type of service, d8taplex.
Another interpretation of the term comes from a company which is called Data Engines Corporation. Their business idea appears to be a kind of quality control for data (and predictors) where there is no known “ground truth.” In other words, they work with unsupervised inference and in particular with assessing the quality of existing predictors (or recognizers, as they call them.) They have a couple of interesting blog posts (if it is a blog – these posts are listed under the “The Technologies” heading) which give some indications of what their approach is about, such as The Marriage Lemma Again and Accuracy is Cheap, Precision Expensive.