Follow the Data

A data driven blog

Archive for the tag “computational-journalism”

Precision journalism

This (PDF link) is an interesting slide show from a talk about data-driven journalism that Peter Aldhous gave at an R users’ meeting last week. There are some good examples of data visualization there (I especially liked the Obama-Clinton decision tree), but the most interesting thing for me was to learn about Philip Meyer, who “pioneered use of quantitative methods in journalism with Knight Newspapers in 1960s”, according to the presentation. In 1973, Meyer published a book called Precision Journalism (Amazon link), which I think is a really neat title.

Advertisements

Personal genome glitch uncovered

As recounted in this New Scientist article and commented upon in Bio-IT World, journalist Peter Aldhous managed to uncover a bug in the deCODEme browser (Decode Genetics’ online tool for viewing parts of your own genome). deCODEme is one of a handful of services, including 23andme and Navigenics, that genotype small genetic variations called SNPs (snips; single-nucleotide polymorphisms) in DNA samples submitted by customers. The results are then used to calculate disease risks and other things, which are displayed to the customer in a personalized view of his or her genome.

Aldhous was comparing the output he got from two of these services – deCODEme and 23andme  – and discovered that they were sometimes very different. After patiently going to the bottom of the matter, he discovered that the reason for the discrepancy was that the deCODEme browser sometimes (but not always) displayed jumbled output for mitochondrial sequences. According to Bio-IT World, the bug seems to have been due to an inconsistency between 32-bit and 64-bit computing environments and has now been fixed.

Isn’t this a nice example of computational journalism, where a journalist is skilled or persistent enough to actually analyze the data that is being served up and detect inconsistencies?

I might as well sneak in another New Scientist article about personal genomes. This one urges you to make your genome public in the name of the public good. It mentions the Harvard Personal Genome Project, which aims to enroll 100,000 (!!) participants whose genomes will be sequenced. The first ten participants, some of which are pretty famous, have agreed to share their DNA sequence freely.

I have no idea whether the Personal Genome Project is related to the Coriell Personalized Medicine Collaborative which also wants to enroll 100,000 participants in a longitudinal study where the goal is to find out how much utility there is in using  personal genome information in health management and clinical decision-making

Quick links

Ran across a couple of interesting links:

Space-Time Travel Data is Analytic Super-Food!, a very meaty blog post where Jeff Jonas starts by discussing largely the same themes that I blogged about a while back, but he has thought more – a lot more! – deeply about it and delivers a number of interesting insights and predictions. The comments section contains some good stuff too.

Data is Journalism – this post discusses the acquisition by MSNBC.com of the local data aggregator service Everyblock. The question of whether data “is” journalism reminds me of the world of science – is a big and hard-to-obtain data set worthy of being published in a prestigious journal, even if the accompanying paper lacks a clear advancement in scientific knowledge? These questions may not be correctly formulated, and when it comes to journalism, I’m certain that data analysis and presentation will play an important role in its future, along with the more traditional components.

Computational journalism, an interesting emerging field

Computational journalism? On first blush, the terms sounds somewhat absurd, like the imaginary field of computational theology that we used to joke about in my grad school days. But it’s actually not absurd at all. Computing know-how is essential for today’s press in at least two ways:

  1. Because people rightly expect more from online media than just a digital version of a newspaper, like commenting capabilities, dynamic suggestions for related news, and so on, and
  2. Because computing and data-mining savvy can come in handy for investigative journalists who want to, for example, track the allocation of government funds.

Deep Throat meets data mining is an interesting article about the second point. Computational journalism could also involve things like aggregating similar news pieces (Google News), collaborative filtering (digg.com, plastic.com etc.), mashups of maps and contents (like dynamically updated maps of swine flu cases), etc.

The domain name www.computational-journalism.com points to a site about what seems to have been the computational journalism field’s inaugural symposium, held at Georgia Tech in the US in February 2008. The symposium report highlights themes like the following:

While readership numbers for traditional print media are rapidly falling, the potential audience for online and computer-mediated news is soaring, which poses new opportunities for computation-savvy journalists.

Visualization tools can help journalists tell their stories more clearly and powerfully.

Sophisticated algorithms can perform as arbiters of what’s news not only on a macro level, but in the delivery of online news pages tailored to an individual’s interests.

Innovations in computation and the Internet are re-defining news itself, transforming it from a top-down, elitist model to more of a grassroots, user-driven model.

Social networks, blogs and user-moderated Web sites are significant sources of article ideas as well as providing a means for receiving and posting news.

Computational media is not simply an electronic and digital rendition of print. It draws a demographically different audience than print and requires collaboration among journalists and computational specialists to maximize technology’s advantages and the readers’ expectations.

These points all make sense, although I am a little wary of potential overuse of tailored information delivery in the future, leading to everybody just receiving the news they are expected to like … One of the advantages of the old-school newspaper is that you come across unexpected things that you wouldn’t have thought of looking for. Maybe I’m just old-fashioned. The participants at the symposium also pinpointed four things that computational journalists need to do:

  1. Continue expanding the interactivity of news,
  2. Explore methods of verifying the accuracy and currency of online information,
  3. Leverage the growth of social networking to support better news information and
  4. Support open source software development.

Georgia Tech already offers classes on computational journalism, and has done so since 2007, it seems. Interestingly, the course contains material about e.g. networked sensors, mobile computing and data gathering, data mining for personalization, and citizen journalism. This seems like an interesting field which I will want to keep my eyes on.

Some bleeding-edge (well, relatively speaking!) news media have already seen this coming. The Guardian, for example, has released an API for accessing some of their data and content through your own programs. In order to use the Guardian Content API, though, you have to get an API key by telling The Guardian what kind of application you intend to build using their content. The API has libraries for the Ruby, Python, PHP and Java languages, but it uses REST so basically any language can be used to develop applications for it, as long as the language can talk to web pages.

The New York Times has also been releasing several APIs. The Article Search API was released in February. This API allows you to search for any articles printed in the New York Times from 1981 onwards. Again, you need to sign up for an API key.

Another resource which will boost computational journalism is data.gov, a data repository recently introduced by the Obama administration. Its aim is to provide machine-readable access to data sets generated and held by the US Federal Government. The web page states that: “Data.gov strives to make government more transparent and is committed to creating an unprecedented level of openness in Government.”

Post Navigation