To follow up on yesterday’s post about data sources on the web, I’d like to mention an interesting resource, predict.i2pi, which automatically builds predictive models based on data that you upload. Using it could hardly be simpler – you just have to prepare a comma-separated text file with attributes (predictor variables) and one or more target values (response variables), with the latter being identified as such by putting a star (*) in front of the variable name in the header row. The system will then match your particular data file to a set of suitable prediction algorithms (for example, regression models rather than classification models for a continuous response variable), evaluate the performance of these algorithms on a hold-out set from your data, and output the best results. As the site itself puts it,
Our team of elves will work on your file, running it against a range of model types and keeping track of the best ones. Every now and then we will update your page indicating the best models to date.
There’s also an API for predict.i2pi, and developers of statistical learning methods are encouraged to integrate their own favourite algorithms into the system. Read this blog post for more details.
For in-depth background on the various statistical learning and machine learning algorithms, you could do worse than to check out the lectures at videolectures.net. There’s really an astounding amount of information there about lots of different fields, but in particular computer science, with a skew towards machine learning.