Hacking open government data
I spent last weekend with my talented colleagues Robin Andéer and Johan Dahlberg participating in the Hack For Sweden hackathon in Stockholm, where the idea is to find the most clever ways to make use of open data from government agencies. Several government entities were actively supporting and participating in this well-organized though perhaps slightly unfortunately named event (I got a few chuckles from acquaintances when I mentioned my participations.)
Our idea was to use data from Kolada, a database containing more than 2000 KPIs (key performance indicators) for different aspects of life in the 290 Swedish municipalities (think “towns” or “cities”, although the correspondence is not exactly 1-to-1), to get a “birds-eye view” of how similar or different the municipalities/towns are in general. Kolada has an API that allows piecemeal retrieval of these KPIs, so we started by essentially scraping the database (a bulk download option would have been nice!) to get a table of 2,303 times 290 data points, which we then wanted to be able to visualize and explore in an interactive way.
One of the points behind this app is that it is quite hard to wrap your head around the large number of performance indicators, which might be a considerable mental barrier for someone trying to do statistical analysis on Swedish municipalities. We hoped to create a “jumping-board” where you can quickly get a sense on what is distinctive for each municipality and which variables might be of interest, after which a user would be able to go deeper into a certain direction of analysis.
We ended up using the Bokeh library for Python to make a visualization where the user can select municipalities and drill down a little bit to the underlying data, and Robin and Johan cobbled together a web interface (available at http://www.kommunvis.org). We plotted the municipalities using principal component analysis (PCA) projections after having tried and discarded alternatives like MDS and t-SNE. When the user selects a town in the PCA plot, the web interface displays its most distinctive (i.e. least typical) characteristics. It’s also possible to select two towns and get a list of the KPIs that differ the most between the two towns (based on ranks across all towns). Note that all of the KPIs are named and described in Swedish, which may make the whole thing rather pointless for non-Swedish users.
Perhaps unsurprisingly, there were lots of cool projects on display at Hack for Sweden. The overall winners were the Ge0Hack3rs team, who built a striking 3D visualization of different parameters for Stockholm (e.g. the density of companies, restaurants etc.) as an aid for urban planners and visitors. A straightforward but useful service which I liked was Cykelranking, built by the Sweco Position team, an index for how well each municipality is doing in terms of providing opportunities for bicycling, including detailed info on bicycle paths and accident-prone locations.
This was the third time that the yearly Hack for Sweden event was held, and I think the organization was top-notch, in large, spacey locations with seemingly infinite supply of coffee, food, and snacks, as well as helpful government agency data specialists in green T-shirts whom you were able to consult with questions. We definitely hope to be back next year with fresh new ideas.
This was more or less a 24-hour hackathon (Saturday morning to Sunday morning), although certainly our team used less time (we all went home to sleep on Saturday evening), yet a lot of the apps built were quite impressive, so I asked some other teams how much they had prepared in advance. All of them claimed not to have prepared anything, but I suspect most teams did like ours did (and for which I am grateful): prepared a little dummy/bare-bones application just to make sure they wouldn’t get stuck in configuration, registering accounts etc. on the competition day. I think it’s a good thing in general to require (as this hackathon did) that the competitors state clearly in advance what they intend to do, and prod them a little bit to prepare in advance so that they can really focus on building functionality on the day(s) of the hackathon instead of fumbling around with installation.