Computational journalism? On first blush, the terms sounds somewhat absurd, like the imaginary field of computational theology that we used to joke about in my grad school days. But it’s actually not absurd at all. Computing know-how is essential for today’s press in at least two ways:
- Because people rightly expect more from online media than just a digital version of a newspaper, like commenting capabilities, dynamic suggestions for related news, and so on, and
- Because computing and data-mining savvy can come in handy for investigative journalists who want to, for example, track the allocation of government funds.
Deep Throat meets data mining is an interesting article about the second point. Computational journalism could also involve things like aggregating similar news pieces (Google News), collaborative filtering (digg.com, plastic.com etc.), mashups of maps and contents (like dynamically updated maps of swine flu cases), etc.
The domain name www.computational-journalism.com points to a site about what seems to have been the computational journalism field’s inaugural symposium, held at Georgia Tech in the US in February 2008. The symposium report highlights themes like the following:
While readership numbers for traditional print media are rapidly falling, the potential audience for online and computer-mediated news is soaring, which poses new opportunities for computation-savvy journalists.
Visualization tools can help journalists tell their stories more clearly and powerfully.
Sophisticated algorithms can perform as arbiters of what’s news not only on a macro level, but in the delivery of online news pages tailored to an individual’s interests.
Innovations in computation and the Internet are re-defining news itself, transforming it from a top-down, elitist model to more of a grassroots, user-driven model.
Social networks, blogs and user-moderated Web sites are significant sources of article ideas as well as providing a means for receiving and posting news.
Computational media is not simply an electronic and digital rendition of print. It draws a demographically different audience than print and requires collaboration among journalists and computational specialists to maximize technology’s advantages and the readers’ expectations.
These points all make sense, although I am a little wary of potential overuse of tailored information delivery in the future, leading to everybody just receiving the news they are expected to like … One of the advantages of the old-school newspaper is that you come across unexpected things that you wouldn’t have thought of looking for. Maybe I’m just old-fashioned. The participants at the symposium also pinpointed four things that computational journalists need to do:
- Continue expanding the interactivity of news,
- Explore methods of verifying the accuracy and currency of online information,
- Leverage the growth of social networking to support better news information and
- Support open source software development.
Georgia Tech already offers classes on computational journalism, and has done so since 2007, it seems. Interestingly, the course contains material about e.g. networked sensors, mobile computing and data gathering, data mining for personalization, and citizen journalism. This seems like an interesting field which I will want to keep my eyes on.
Some bleeding-edge (well, relatively speaking!) news media have already seen this coming. The Guardian, for example, has released an API for accessing some of their data and content through your own programs. In order to use the Guardian Content API, though, you have to get an API key by telling The Guardian what kind of application you intend to build using their content. The API has libraries for the Ruby, Python, PHP and Java languages, but it uses REST so basically any language can be used to develop applications for it, as long as the language can talk to web pages.
The New York Times has also been releasing several APIs. The Article Search API was released in February. This API allows you to search for any articles printed in the New York Times from 1981 onwards. Again, you need to sign up for an API key.
Another resource which will boost computational journalism is data.gov, a data repository recently introduced by the Obama administration. Its aim is to provide machine-readable access to data sets generated and held by the US Federal Government. The web page states that: “Data.gov strives to make government more transparent and is committed to creating an unprecedented level of openness in Government.”