Is it unusually cold today?

The frequently miserable Swedish weather often makes me think “Is it just me, or is it unusually cold today?” Occasionally, it’s the reverse scenario – “Hmm, seems weirdly warm for April 1st – I wonder what the typical temperature this time of year is?” So I made myself a little Shiny app which is now hosted here. I realize it’s not so interesting for people who don’t live in Stockholm, but then again I have many readers who do … and it would be dead simple to create the same app for another Swedish location, and probably many other locations as well.

The app uses three different data sources, all from the Swedish Meteorological and Hydrological Institute (SMHI). The estimate of the current temperature is taken from the “latest hour” data for Stockholm-Bromma (query). For the historical temperature data, I use two different sources with different granularity. There is a data set that goes back to 1756 which contains daily averages, and another one that goes back to 1961 but which has temperatures at 06:00 (6 am), 12:00 (noon) and 18:00 (6 pm). The latter one makes it easier to compare to the current temperature, at least if you happen to be close to one of those times.


Stockholm data happenings

The weather may be terrible at the moment in Stockholm (it was really a downer to come back from the US this morning) but there are a couple of interesting data-related events coming up. The past week, I missed two interesting events: the KTH Symposium on Big Data (past Mon, May 26) and the AWS Summit (past Tue, May 27).

In June, there will be meetups on deep learning (Machine Learning Stockholm group, June 9 at Spotify) and on Shiny and ggvis presented by Hadley Wickham himself (Stockholm useR group, June 16 at Pensionsmyndigheten.) There are wait lists for both.

Danny Bickson is giving a tutorial on GraphLab at Stockholm iSocial Summer school June 2-4. He has indicated that he would be happy to chat with anyone who is interested in connection with this.

King are looking for a “data guru” – a novel job title!

Finally, Wilhelm Landerholm, a seasoned data scientist who was way ahead of the hype curve, has finally started (or revived?) his blog on big data, which unfortunately is Swedish only: We Want Your Data.



Digital Health Days 2013

The Digital Health Days, which will be held in Stockholm on August 21-22, looks like an event that will touch upon many of the things mused upon in this blog, for instance analytics, gamification and self-tracking in relation to medicine and the life sciences. A quick glance at the program reveals session names like “The New Health Enablers: Mobile Health Solutions, Big Data Analytics, Gamification and Games For Health”, “Digital Health Science”, “Smarter Care and Watson for healthcare”, “Computational Health and Big Data Analytics as tools for life science”, etc.

The conference is a bit pricey for the casual visitor (2990 SEK ~ 345 EUR ~ 450 USD) but has a good discount for students, who’ll only need to pay 490 SEK (~56 EUR / ~73 USD).

Stockholm Big Data Meetup

The first meetup of the Stockholm Big Data group was organized yesterday (Sep 6 2012) by Mikael Hussain at the Klarna headquarters. The audience was packed, with close to a 100 people attending and others unfortunately left out (due to fire regulations.) Apparently a lot of people (including us) had been thirsting for this sort of event.

The format was 1.5h of rapid talks (supposed to be 10 min each but probably a bit longer in practice) on widely different topics – we will refer to Marina Santini’s excellent writeup for details on the talks – followed by socializing in the pub around the corner. Follow the Data was represented by me (Mikael) as I gave a short talk about the benefits of competing in (and organizing) online prediction contests.

During the course of the event, I learned about three companies that I didn’t know about and who are all actively looking for analytics and big data talent:

  • Campanja – online advertising, heavily into Erlang and AI. Looking to fill several positions of different kinds
  • Svensk Lånemarknad (~Swedish Loan Exchange?) – help customers find the best banks and loans for them – looking to fill a predictive analytics position
  • Tink – not quite sure what they are doing (the home page is a bit cryptic) – looking for developers

I’m sure there were other companies as well looking to recruit – I only had time to talk to a small fraction of the participants, obviously!

All in all, I think the meetup was a lot of fun and I am looking forward to more meetups in Stockholm soon.

Meetup groups for Big Data & Predictive Modeling and Quantified Self in Stockholm

Two interesting new meetup groups have formed in Stockholm (well, there are other interesting ones but for the purposes of this blog these two are the most exciting):


Health Hack Day ’12: Day 1 impressions

So as mentioned in the previous post, Health Hack Day ’12 in Stockholm is underway right now; it started with a number of lectures and a party yesterday and the actual hacking will start today, with the winning apps to be presented tomorrow. You can follow the #hhd12 hashtag on Twitter or go to the link above to see the recorded lectures.

I thought the arrangements and speaker line-up yesterday were surprisingly good, which bodes well for the survival of the Health Hack Day concept, in fact I’m sure they will be back next year. The lectures (which were recorded and can be viewed online at the link above) were given in a smallish space (part of a fin de siècle apartment complex now used as an office hotel for creative types, located near Stureplan in central Stockholm) decorated with thousands of yellow strips of paper hanging down from the ceiling – a nice-looking installation which also provided some relief from the heat in the room when the wind occasionally blew in through the window and turned the paper strips into a giant ceiling fan. Meanwhile, visitors could sip some excellent free coffee (from Stockholm roast).

Hoa Ly is a young, enterprising fellow who works for Psykologifabriken (“The Psychology Factory”) and his own sister company Hoa’s Tool Shop (both of these companies were involved in arranging the event), as well as doing clinical psychology research at Linköping university and being a successful DJ. He talked about behavior change through digital tools, exemplifying with the Viary mobile & web app which has been used successfully for depression treatment but, as I understand it, is quite general in nature so you could track any kind of behavior & goals (incidentally, the statistics interface looks a lot like the WordPress interface where I look at access statistics for this blog!) Hoa also talked about correlating data from different sources like Viary, the Zeo sleep tracker and exercise data from Integrating data from different sources is of course very interesting but I didn’t feel we quite got any really solid concrete examples here, just a general sense that it should be useful. Anyway. The most intriguing part of Hoa’s talk was when he described the launch of a new project to “disrupt the whole dance music industry” (or words to that effect). The idea is to treat DJ performances as scientific experiments and “gather data from the audience”, for instance by measuring adrenaline levels in response to song selections. Hoa and his partners have created a new  country called Yamarill (link in Swedish) to construct a narrative around which this project will be built. The inauguration of the new country will apparently be celebrated on June 1 at the Hoa’s Tool Shop office spaces. The Yamarill “delegation” has already played several DJ gigs “combining electronic dance music, technology and psychology” as they say in the linked interview (I might also add “quirky clothes”).

Pernilla Rydmark from .SE talked about different forms of crowdfunding and presented five Swedish platforms for it. .SE is also introducing an interesting form of funding called “guaranteed funding” where they pick projects that are already popular on crowdfunding platforms and promise to fund them up to their stated goal in case they don’t succeed in reaching it through the crowdfunding platform. Thus, the goal of the funding is rather paradoxically that no one should get it (because .SE is hoping that the projects will get fully funded by the crowd.)

Bill Day from Runkeeper talked about the need for an open, global health platform and presented HealthGraph, a free platform with tens or millions of users initiated by the RunKeeper team but which is expanding far beyond that community.

Mathias Karlsson from Calmark presented his company’s approach to rapid blood biomarker testing, which is making consumable platforms for colorimetric assays (the measurement of interest is transformed into a color) which can be analyzed on the spot using, for example, a smartphone camera. He brought a developer team who will attempt to build a new test (for bilirubin) into the platform in 24 hours during the hackathon part of the event.

Linus Bengtsson from FlowMinder described intriguing reality mining (or in less spectacular terms, call log analysis) work where data from mobile phone providers was used to track the movements of people during and after the Haiti earthquake, and the subsequent cholera outbreak. Linus and his team tracked 1.9 million SIM cards from Port-Au-Prince residents to obtain their estimates on migration patterns. FlowMinder is a non-profit and provides free analysis of the same kind during any kind of global disaster (in collaboration with mobile telephony providers, naturally.)

Sara Eriksson and Johan Nilsson from United Minds talked about the “new health”, including a lot of topics that have been frequently mentioned on this blog, like 23andme, PatientsLikeMe, and even the MinION sequencer from Oxford Nanopore. I had heard / thought about most of it before but what I took away from it was the concept of “biosociality” as coined by Paul Rabinow, and also that only 37% of surveyed Stockholm smart phone users did *not* want to collect data on themselves through the phone; a whopping 59% wanted not only to collect the data but to analyze it themselves.

Megan Miller from Bonnier (a Swedish media company which has an enormous influence in the media here; however Megan was working for its US branch) described Teemo, a platform for “digital wellness”, with components of collaborative adventuring and social exercise (you try to accomplish “quests” together with your friends by exercising.) Teemo looks like it has a pretty nifty design, inspired by paper cuts and Nordic (=Helsinki?) design style. As Megan put it, Teemo wants to “put fun first and track behavior in the background.)

We will see whether Follow the Data has the energy to visit again tomorrow and see what apps have come out of the hackathon, which should be starting in a few hours from now!

Follow the Data podcast, episode 1: Gavagai! Gavagai!

We have made available the first episode of the Follow the Data podcast! Hope you enjoy it.

Podcast link: Follow The Data | Episode 1 – Gavagai! Gavagai!

This first episode, as has been mentioned before on this blog, is about a Stockholm startup company, Gavagai, which provides a technology platform called Ethersource. We interviewed the company’s CDO (chief data officer), Fredrik Olsson, and the chief scientist, Magnus Sahlgren, and we think it resulted in a very interesting chat, although the sound quality is perhaps not ideal due to our inexperience with podcasting.

Some interesting tidbits from the conversation:

  • The name “Gavagai” comes from a thought experiment by Quine demonstrating the “indeterminacy of translation“. It’s also the reason for the presence of the little rabbit on the Gavagai web page.
  • Olsson describes Ethersource as a “semantic processing layer of the big data stack” and a “base technology for semantics.” An alternative, more everyday description would be the one in this nice interview from Scandinavian Startups: “Finding meaning before it is evident.”
  • Ethersource learns meaning from text, which is the core of the technology; use cases include “sentiment analysis on steroids”, textual profiling and market analysis.
  • The Ethersource system is based on intrinsically scalable technology (which toward the end of the podcast turned out to be based on mimicking computation in the brain and “sparse distributed representation”) which can ingest any type of linguistic data stream; Gavagai have not been able to “saturate the system” in terms of storage despite ingesting everything they can get their hands on. The underlying technology is based on “random indexing” which is basically a kind of random projection approach (according to Sahlgren); a dimensionality reduction method which allows incremental processing (rather than, e.g., running huge SVDs.)
  • As a result of the underlying design, Ethersource builds up representations of concepts as it incorporates new data; Gavagai formulates this in the phrase “training equals learning.” The concept-based approach means that the system is extremely good at handling spelling errors and synonyms.
  • Ethersource is not based on concepts such as “documents” or “tweets”, which are completely artificial, according to chief scientist Sahlgren.
  • The system’s design also means that it does not have any problems handling different languages, even languages that use different text encodings.
  • Gavagai did not start out as a “big data” company but they are now relatively comfortable in their role as one.
  • Fredrik Olsson used to work for Recorded Future, which he feels is not a competitor to Gavagai, but would be a perfect customer.

Me and Joel were perhaps not very comfortable in our new roles as podcasters and struggled a bit with finding the right words in English. We also recorded a post-show chat in Swedish where we are more relaxed and coherent. Some tidbits from this part, which we also plan to put online at some point:

  • The Gavagai founders have a radical view of linguistics, where there is no hard line between syntax and semantics, but rather a kind of continuum.
  • They don’t believe in sampling, but try to ingest everything they can find into the system.
  • The Gavagai team tries to put aside some time every day to look at interesting concepts and connections between concepts discovered by the system.
  • They expected that a word like apple (Apple) would have a large number of different meanings, but when they looked at data from social media during a specific period in time, it had just three major meanings.
  • Language does its own disambiguation; for example, after Apple has become well-known as a software company, people have started to talk more about “apples” rather than “an apple” when they mean the fruit (if I interpreted Magnus correctly).
  • They view the stock market as a way to validate their semantic analysis. “Stock prices are the closest you can get to an objective validation.”
  • The founders came from a research background, and found that starting Gavagai gave a huge boost to their research activities due to the new pressure to build and release something that works in the “real world”

In the evening of the day of the interview (March 9, 2012), Swedish daily Svenska Dagbladet released an article about Gavagai’s Ethersource-based real-time sentiment tracking of the buzz around the contestants who would appear in the Swedish Eurovision finals the following day. In the end, the Ethersource forecasts turned out to be very accurate.

Although it’s far from clear what the next episodes of the podcast will be about, in general we will restrict ourselves to interviewing interesting companies or scientists (rather than just talking amongst ourselves), with a bias towards Swedish interviewees since this is where we are located and it might be interesting for people from other locations to hear what is going on here.

EDIT 17/3 2012: Our podcast jingle was created by Karl Ekdahl, the man behind the awesome Ekdahl Moisturizer, among many other things.

