Follow the Data

A data driven blog

Archive for the tag “resources”

16 Jul 2012

Summer reading

Some nice reading for the summer (in case of a rainy day of course):

Prediction, Learning and Games (PDF link) – Nice textbook on prediction. Via @ML_hipster (worth following on Twitter if you like @bigdatahipster and/or authentic, hand-crafted decision trees)
Data Science 101, a very nice blog which points to a multitude of resources
School of Data and the accompanying Data Wrangling Handbook
Agile Data by Russell Jurney (who is well worth following on Twitter and especially Quora). This book isn’t finished yet but can be viewed in its current state of development at the given link, which is within the Open Feedback Publishing System at O’Reilly Media. So you can, on one hand, read the book (or parts of it) for free before publication, and on the other hand, provide feedback and thus shape the contents of the book.
(edit 17/7 2012) Might as well throw this one in: Data Jujitsu: The Art of Turning Data into Product by DJ Patil, a free O’Reilly Radar report (epub/PDF/mobile).

Posted by Mikael Huss in Uncategorized and tagged data-science, Links, prediction, resources | Leave a comment

07 Oct 2009

Couch DB — mapreduce for the masses

Couch DB, since a while back an Apache Foundation project, is a document-oriented database that can be queried with simple javascript queries in map/reduce fashion. Couch DB is built upon Erlang, which is a very interesting functional language built for extreme reliability in the telecom industry. One of the advantages of erlang is the support for parallelism, just add more cores and servers, and the map/reduce queries will go faster. Normal databases like mysql or postgres cant scale to several servers, and the end game is to buy one really big iron if you have built your application around a single database, that problem is no more with technology like CouchDB. This neat interactive demo shows what couch db is all about.

Posted by Joel Westerberg in Articles, Uncategorized and tagged databases, resources | Leave a comment

27 Aug 2009

predict.i2pi

To follow up on yesterday’s post about data sources on the web, I’d like to mention an interesting resource, predict.i2pi, which automatically builds predictive models based on data that you upload. Using it could hardly be simpler – you just have to prepare a comma-separated text file with attributes (predictor variables) and one or more target values (response variables), with the latter being identified as such by putting a star (*) in front of the variable name in the header row. The system will then match your particular data file to a set of suitable prediction algorithms (for example, regression models rather than classification models for a continuous response variable), evaluate the performance of these algorithms on a hold-out set from your data, and output the best results. As the site itself puts it,

Our team of elves will work on your file, running it against a range of model types and keeping track of the best ones. Every now and then we will update your page indicating the best models to date.

There’s also an API for predict.i2pi, and developers of statistical learning methods are encouraged to integrate their own favourite algorithms into the system. Read this blog post for more details.

For in-depth background on the various statistical learning and machine learning algorithms, you could do worse than to check out the lectures at videolectures.net. There’s really an astounding amount of information there about lots of different fields, but in particular computer science, with a skew towards machine learning.

Posted by Mikael Huss in Web sites and tagged prediction, resources | Leave a comment

About

By Mikael Huss (@mikaelhuss) and Joel Westerberg (@tuxtux).

More about Follow the Data
Feeds
- Posts feed
- Podcast feed
Categories
- Articles (18)
- Books (5)
- Companies (28)
- Emerging fields (17)
- Events (8)
- ftd-podcast (5)
- Hacks (3)
- Links (15)
- People (16)
- Research (25)
- Tools and Software (20)
- Tutorial (8)
- Uncategorized (124)
- Web sites (10)
advertising aging analytics api augmented-reality Beijing big-data bioinformatics blog Books cancer challenge China citizen-science cloud-computing collaborative competition competitions computational-journalism conference contagious course crowdsourcing data data-analysis data-mining data-science data-visualization databases deep-learning genomics genomics-api Google h+ hadoop happiness health healthcare ibm kaggle Links machine-learning meetup meta metagenomics microfluidics mobile MOOC networks notes open-data personal-genomics personalized-medicine prediction privacy quick-links R reality-mining resources RNA-seq sage science self-tracking sequencing Singapore sleep social-data social-networks statistics stockholm svd sweden tutorial Twitter visualization
Calendar

May 2024

M T W T F S S

1 2 3 4 5

6 7 8 9 10 11 12

13 14 15 16 17 18 19

20 21 22 23 24 25 26

27 28 29 30 31

« Jan
Blogroll
Menu
- About Follow the Data

Follow the Data

A data driven blog

Archive for the tag “resources”

Summer reading

Couch DB — mapreduce for the masses

predict.i2pi

Post Navigation

About

Feeds

Categories

Calendar

Blogroll

Menu