Follow the Data

A data driven blog

Archive for the tag “statistics”

09 Aug 2009

Cool statisticians

I think this New York Times article is the third one I’ve read during the past few months claiming that statisticians will be the big winners on the job market in the next ten years. I wonder whether this trend will largely be confined to the big U.S. giants like Google, IBM and so on, or if it will bleed over to other countries, but I tend to think it will. As economist Erik Brynjolfsson of MIT’s Center for Digital Business says in the article:

“We’re rapidly entering a world where everything can be monitored and measured,” said Erik Brynjolfsson, an economist and director of the Massachusetts Institute of Technology’s Center for Digital Business. “But the big problem is going to be the ability of humans to use, analyze and make sense of the data.”

… and these problems are going to be felt almost everywhere. By the way, Brynjolfsson’s quote is a pretty good description of the themes I want to explore in this blog.

Posted by Mikael Huss in Articles and tagged statistics | Leave a comment

08 Jul 2009

Reverse engineering social security numbers

The latest issue of PNAS (Proceedings of the National Academy of Sciences of the United States of America; a well-known scientific journal) contains two interesting pieces of statistical analysis. Luckily, they are both freely downloadable even if you don’t have access to a subscription.

Predicting Social Security numbers from public data claims that USA:s social security numbers (SSN), which are supposed to be confidential, are actually to a certain extent predictable, at least for younger people, given information such as birth date and location. Basically, the authors (from Carnegie Mellon university) have tried to reverse-engineer the SSN assignment process using available information about this process, including the so-called SSA Death Master File which is publicly available and contains data about SSN assignments for people who have been reported as dead.

The authors detected various correlations between e.g. date of birth and all the nine digits in the SSN, and eventually (after much visual inspection and several rounds of model refinement) constructed a regression model for predicting digits in an SSN based on birth date. They managed to correctly predict the SSN of 8.5% of deceased individuals in less than 1,000 tries.

Naturally, this suggests possibilities for e.g. identity theft and poses the question whether social security numbers should be replaced by something else.

Another study in the latest PNAS, NIH funding trajectories and their correlations with US health dynamics from 1950 to 2004, suggests that funding of research relating to certain diseases leads to a time-lagged decrease in deaths due to those diseases – in other words, the research appears pay off with a time lag. In order to do their analysis, the authors compiled data on NIH (the US National Institutes of Health) funding starting in 1937 and compared those to mortality data for cardiovascular disease, stroke, cancer, and diabetes.

Posted by Mikael Huss in Research and tagged data-mining, statistics | Leave a comment

About

By Mikael Huss (@mikaelhuss) and Joel Westerberg (@tuxtux).

More about Follow the Data
Feeds
- Posts feed
- Podcast feed
Categories
- Articles (18)
- Books (5)
- Companies (28)
- Emerging fields (17)
- Events (8)
- ftd-podcast (5)
- Hacks (3)
- Links (15)
- People (16)
- Research (25)
- Tools and Software (20)
- Tutorial (8)
- Uncategorized (124)
- Web sites (10)
advertising aging analytics api augmented-reality Beijing big-data bioinformatics blog Books cancer challenge China citizen-science cloud-computing collaborative competition competitions computational-journalism conference contagious course crowdsourcing data data-analysis data-mining data-science data-visualization databases deep-learning genomics genomics-api Google h+ hadoop happiness health healthcare ibm kaggle Links machine-learning meetup meta metagenomics microfluidics mobile MOOC networks notes open-data personal-genomics personalized-medicine prediction privacy quick-links R reality-mining resources RNA-seq sage science self-tracking sequencing Singapore sleep social-data social-networks statistics stockholm svd sweden tutorial Twitter visualization
Calendar

May 2024

M T W T F S S

1 2 3 4 5

6 7 8 9 10 11 12

13 14 15 16 17 18 19

20 21 22 23 24 25 26

27 28 29 30 31

« Jan
Blogroll
Menu
- About Follow the Data

Follow the Data

A data driven blog

Archive for the tag “statistics”

Cool statisticians

Reverse engineering social security numbers

Post Navigation

About

Feeds

Categories

Calendar

Blogroll

Menu