Follow the Data

A data driven blog

Archive for the tag “databases”

07 Oct 2009

Couch DB — mapreduce for the masses

Couch DB, since a while back an Apache Foundation project, is a document-oriented database that can be queried with simple javascript queries in map/reduce fashion. Couch DB is built upon Erlang, which is a very interesting functional language built for extreme reliability in the telecom industry. One of the advantages of erlang is the support for parallelism, just add more cores and servers, and the map/reduce queries will go faster. Normal databases like mysql or postgres cant scale to several servers, and the end game is to buy one really big iron if you have built your application around a single database, that problem is no more with technology like CouchDB. This neat interactive demo shows what couch db is all about.

Posted by Joel Westerberg in Articles, Uncategorized and tagged databases, resources | Leave a comment

14 Aug 2009

What is big data?

Did you know that the word data means “things given” in Latin? That’s just one of the things I learned from a very interesting (free) article, The Pathologies of Big Data by former computational neuroscientist Adam Jacobs. He also makes the perceptive comment that the word data tends to get uses as a mass noun in English, as if it denoted a substance. (After reading these interesting insights, it was no surprise to learn that Jacobs also has a degree in linguistics.)

The article discusses what “big data” really means in this day and age when we can actually keep, for instance, a dataset containing information about the entire world population in memory (not to mention on disk) on a pretty ordinary Dell server. Jacobs argues that getting stuff into databases is easy, but getting it out (in a useful form) is hard; the bottleneck lies in the analysis rather than the raw data manipulation.

He also argues that most data-processing tools, including standard relational database management systems, are not really built for the kinds of huge datasets we are starting to encounter now. Although we can in principle keep billions of rows of data in RAM, we can’t easily manipulate them using something like PostgreSQL. And other solutions like the statistics programming language R (one of my favourites) run into hard-coded memory usage limits, often about 4 GB.

A recommended read for those interested in the nerdier side of data.

On a related note, O’Reilly released a report, Big Data: Technologies and Techniques for Large-Scale Data, in January. I haven’t read it (it costs quite a lot of money to buy the PDF), but there is a sample PDF which makes for pretty interesting reading in itself.

Posted by Mikael Huss in Articles and tagged big-data, databases | Leave a comment

About

By Mikael Huss (@mikaelhuss) and Joel Westerberg (@tuxtux).

More about Follow the Data
Feeds
- Posts feed
- Podcast feed
Categories
- Articles (18)
- Books (5)
- Companies (28)
- Emerging fields (17)
- Events (8)
- ftd-podcast (5)
- Hacks (3)
- Links (15)
- People (16)
- Research (25)
- Tools and Software (20)
- Tutorial (8)
- Uncategorized (124)
- Web sites (10)
advertising aging analytics api augmented-reality Beijing big-data bioinformatics blog Books cancer challenge China citizen-science cloud-computing collaborative competition competitions computational-journalism conference contagious course crowdsourcing data data-analysis data-mining data-science data-visualization databases deep-learning genomics genomics-api Google h+ hadoop happiness health healthcare ibm kaggle Links machine-learning meetup meta metagenomics microfluidics mobile MOOC networks notes open-data personal-genomics personalized-medicine prediction privacy quick-links R reality-mining resources RNA-seq sage science self-tracking sequencing Singapore sleep social-data social-networks statistics stockholm svd sweden tutorial Twitter visualization
Calendar

May 2024

M T W T F S S

1 2 3 4 5

6 7 8 9 10 11 12

13 14 15 16 17 18 19

20 21 22 23 24 25 26

27 28 29 30 31

« Jan
Blogroll
Menu
- About Follow the Data

Follow the Data

A data driven blog

Archive for the tag “databases”

Couch DB — mapreduce for the masses

What is big data?

Post Navigation

About

Feeds

Categories

Calendar

Blogroll

Menu