Follow the Data

A data driven blog

Archive for the category “ftd-podcast”

Follow the Data podcast, episode 5: Journalism and data

After a looong absence, here is the latest episode of the FTD podcast. This time, we talk to Peter Grensund and Jens Finnäs, who write data-driven news stories at the Stockholm arm of Journalism++ (J++). Among many other things, Jens has a great resource site (sorry, Swedish only) about data journalism and Peter has won a prestigious award for a “mortgage map“, a visualization of mortgage rates in different parts of Sweden.

Some topics discussed:

  • The term “data journalism” (or precision journalism, database journalism etc.) Peter and Jens emphatically stated that they are first and foremost journalists, perhaps just a bit more data literate than average. After all, all journalism is about collecting information, weighing it and telling an interesting story.
  • The career prospects of a data-driven journalist.
  • Which media companies are the most forward-looking in Sweden in terms of these things.
  • Excel as the “Swiss army knife of data journalism” vs. R
  • How the barrier to entry is actually quite low. It is not difficult for a journalist to learn how to use data resources.

Enjoy!

Listen to the podcast | Follow The Data #5: Journalism and data

Follow the Data podcast, episode 4: Self-tracking with Niklas Laninge

In this episode of our podcast, we shift our focus from the “big data” themes in episodes 1-3 to personal data and self-tracking. We talked to Niklas Laninge, founder of Psykologifabriken (“The Psychology Factory”) and COO of Hoa’s Tool Shop, which are both relatively new startups based in Stockholm and which use applied psychology in innovate ways to facilitate lasting behavior change – in the case of the latter company, using digital tools such as smart phone apps. Niklas is also an avid collector of data on himself and describes some things he has found out by analyzing those data – and remarks that “When my [Nike] Fuelband broke, part of myself broke as well.”

At one point, I (Mikael) miserably failed to get the details right about The Human Face of Big Data project, which I erroneously call “Faces of Big Data” in the podcast. Also, I said that it was created by Greenplum, when in fact it was developed by Against All Odds productions (Rick Smolan and Jennifer Erwitt) and sponsored by EMC (of which Greenplum is a division.)

Some of the things we discussed:

Viary, a tools that facilitates behavior change in organizations or individuals

– Clinical trials showing promising results from using Viary to treat depression

– “Dance-offs” as a fun way to interact with people on the dance floor and get an extreme exercise session

Listen to the podcast | Follow The Data #4 : Self Tracking with Niklas Laninge

Follow the Data podcast, episode 3: Grokking Big Data with Paco Nathan

In this third episode of the Follow the Data podcast we talk to Paco Nathan, Data Scientist at Concurrent Inc.

Podcast link: http://s3.amazonaws.com/follow_the_data/FollowTheData_03_Podcast.mp3

Paco’s blog: http://ceteri.blogspot.se/

The running time is about one hour.

Paco’s internet connection died just as we were about to start the podcast so he had to connect via Skype on the iPhone. We apologize on the behalf of his internet provider in Silicon Valley for the reduced sound quality caused by this.

Here’s a few links to stuff we discussed:

http://www.cascading.org/
An application framework for Java developers to quickly and easily develop robust Data Analytics and Data Management applications on Apache Hadoop.

http://clojure.org/
A dialect of Lisp that runs on the JVM.

https://github.com/twitter/scalding
A Scala library that makes it easy to write MapReduce jobs in Hadoop.

http://www.cascading.org/multitool/
A simple command line interface for building large-scale data processing jobs based on Cascading.

http://en.wikipedia.org/wiki/CAP_theorem
states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: Consistency, Availability, Partition tolerance

http://www.nature.com/news/nanopore-genome-sequencer-makes-its-debut-1.10051
an article on the USB-sized Oxford Nanopore MinION sequencer

http://datakind.org/
Previously known as Data Without Borders this organisation aims to do good with Big Data.

http://www.climate.com/
Prediction based insurance for farmers.

wikipedia.org All_Watched_Over_by_Machines_of_Loving_Grace_(TV_series)
An interesting take on how programming culture has affected life. Link to episode #2 (http://vimeo.com/29875053)  “The use and abuse of vegetational concepts” – about how the idea of ecosystems came to be, sprung out of the notion of harmony in nature, how this influenced cybernetics and the perils of taking this animistic concept too far.

http://scratch.mit.edu/
A great way to teach kids to code.

http://www.stencyl.com/
Another interesting tool for teaching kids to code and build games.

http://www.minecraft.net/
Free form virtual reality game.

http://www.yelloworb.com/orbblog/
Some info on arduino-based wireless wind measurement project by Karl-Petter Åkesson (in Swedish).

http://www.fringeware.com/
A pioneering internet retailer that Paco was one of the founders for.

Follow the Data podcast, episode 2: King of BigData

In the second episode of the FTD podcast, we talked to big data consultant Johan Pettersson (his company is actually called Big Data AB; what a catch to be able to obtain that name despite the Swedish trademark regulations!) and Thomas Hartwig, CTO of King.com, a company that produces “skill games” where you can win money by being more skillful than your competitors.

We knew practically nothing about King.com coming into the interview and were surprised to learn that they are the second biggest game producer on Facebook! Some other things of note from the interview:

  • King.com currently captures about 1.5 billion game events each day from about 12 million users per day;
  • They don’t have a dedicated data analysis group but rather an “embedded analyst”  in each developer team (each game has its own team);
  • Johan Pettersson does not think the demand for big data specialists or data scientists in Sweden is that high at the moment (although everyone is talking about “big data”, almost no one is really working with it), but it will probably be in 1-2 years.
  • However, good data analysts are in high demand and therefore hard to find.

Podcast link: Follow the Data | Episode 2 – King of BigData Podcast

The “AfterDark” discussion afterwards (in Swedish)
Follow the Data | Episode 2 – King of BigData After Dark

Follow the Data podcast, episode 1: Gavagai! Gavagai!

We have made available the first episode of the Follow the Data podcast! Hope you enjoy it.

Podcast link: Follow The Data | Episode 1 – Gavagai! Gavagai!

This first episode, as has been mentioned before on this blog, is about a Stockholm startup company, Gavagai, which provides a technology platform called Ethersource. We interviewed the company’s CDO (chief data officer), Fredrik Olsson, and the chief scientist, Magnus Sahlgren, and we think it resulted in a very interesting chat, although the sound quality is perhaps not ideal due to our inexperience with podcasting.

Some interesting tidbits from the conversation:

  • The name “Gavagai” comes from a thought experiment by Quine demonstrating the “indeterminacy of translation“. It’s also the reason for the presence of the little rabbit on the Gavagai web page.
  • Olsson describes Ethersource as a “semantic processing layer of the big data stack” and a “base technology for semantics.” An alternative, more everyday description would be the one in this nice interview from Scandinavian Startups: “Finding meaning before it is evident.”
  • Ethersource learns meaning from text, which is the core of the technology; use cases include “sentiment analysis on steroids”, textual profiling and market analysis.
  • The Ethersource system is based on intrinsically scalable technology (which toward the end of the podcast turned out to be based on mimicking computation in the brain and “sparse distributed representation”) which can ingest any type of linguistic data stream; Gavagai have not been able to “saturate the system” in terms of storage despite ingesting everything they can get their hands on. The underlying technology is based on “random indexing” which is basically a kind of random projection approach (according to Sahlgren); a dimensionality reduction method which allows incremental processing (rather than, e.g., running huge SVDs.)
  • As a result of the underlying design, Ethersource builds up representations of concepts as it incorporates new data; Gavagai formulates this in the phrase “training equals learning.” The concept-based approach means that the system is extremely good at handling spelling errors and synonyms.
  • Ethersource is not based on concepts such as “documents” or “tweets”, which are completely artificial, according to chief scientist Sahlgren.
  • The system’s design also means that it does not have any problems handling different languages, even languages that use different text encodings.
  • Gavagai did not start out as a “big data” company but they are now relatively comfortable in their role as one.
  • Fredrik Olsson used to work for Recorded Future, which he feels is not a competitor to Gavagai, but would be a perfect customer.

Me and Joel were perhaps not very comfortable in our new roles as podcasters and struggled a bit with finding the right words in English. We also recorded a post-show chat in Swedish where we are more relaxed and coherent. Some tidbits from this part, which we also plan to put online at some point:

  • The Gavagai founders have a radical view of linguistics, where there is no hard line between syntax and semantics, but rather a kind of continuum.
  • They don’t believe in sampling, but try to ingest everything they can find into the system.
  • The Gavagai team tries to put aside some time every day to look at interesting concepts and connections between concepts discovered by the system.
  • They expected that a word like apple (Apple) would have a large number of different meanings, but when they looked at data from social media during a specific period in time, it had just three major meanings.
  • Language does its own disambiguation; for example, after Apple has become well-known as a software company, people have started to talk more about “apples” rather than “an apple” when they mean the fruit (if I interpreted Magnus correctly).
  • They view the stock market as a way to validate their semantic analysis. “Stock prices are the closest you can get to an objective validation.”
  • The founders came from a research background, and found that starting Gavagai gave a huge boost to their research activities due to the new pressure to build and release something that works in the “real world”

In the evening of the day of the interview (March 9, 2012), Swedish daily Svenska Dagbladet released an article about Gavagai’s Ethersource-based real-time sentiment tracking of the buzz around the contestants who would appear in the Swedish Eurovision finals the following day. In the end, the Ethersource forecasts turned out to be very accurate.

Although it’s far from clear what the next episodes of the podcast will be about, in general we will restrict ourselves to interviewing interesting companies or scientists (rather than just talking amongst ourselves), with a bias towards Swedish interviewees since this is where we are located and it might be interesting for people from other locations to hear what is going on here.

EDIT 17/3 2012: Our podcast jingle was created by Karl Ekdahl, the man behind the awesome Ekdahl Moisturizer, among many other things.

Post Navigation