- I’ve found the Data Skeptic to be a nice podcast about data science and related subjects. For example, the “data myths” episode and the one with Matthew Russell (who wrote Mining the Social Web) are fun.
- When I was in China last month, the seat pocket in front of me in the cab we took from the Beijing airport had a glossy magazine in it. The first feature article was about big data (大数据) analysis applied to Chinese TV series and movies, Netflix-style. Gotta beat those Korean dramas! One of the hotels we stayed in Beijing had organized an international conference on big data analytics the day before we arrived at the hotel. The signs and posters were still there. Anecdotes, not data, but still.
- November was a good meetup month in Stockholm. The Machine Learning group had another good event at Spotify HQ, with interesting presentations from Watty , both about how to “data bootstrap” a startup when you discover that the existing data you’ve acquired is garbage and need to start generating your own in a hurry, and about the actual nitty gritty details of their algorithms (which model and predict energy consumption from different devices in households by deconvoluting a composite signal), and also about embodied cognition and robotics by Jorge Davila-Chacon (slides here). Also, in an effort to revive the Stockholm Big Data group, I co-organized (together with Stefan Avestad from Ericsson) a meetup with Paco Nathan on Spark. The slides for the talk, which was excellent and extremely appreciated by the audience, can be found here. Paco also gave a great workshop the next day on how to actually use Spark. Finally, I’ve joined the organizing committee of SRUG, the Stockholm R useR group, and have started to plan some future meetups there. The next one will be on December 9 and will deal with how Swedish governmental organizations use R.
- Erik Bernhardsson of Spotify has written a fascinating blog post combining two of my favorite subjects: chess and deep learning. He has trained a 3 layer deep and 2048 unit wide network on 100 million games from FICS (the Free Internet Chess Server, where I, incidentally, play quite often). I’ve often thought about why it seems to be so hard to build a chess engine that really learns the game from scratch, using actual machine learning, rather than the rule- and heuristic based programs that have ruled the roost, and which have been pre-loaded with massive opening libraries and endgame tablebases (giving the optimal move in any position with less than N pieces; I think that N is currently about =<7). It would be much cooler to have a system that just learns implicitly how to play and does not rely on knowledge. Well, Erik seems to have achieved that, kind of. The cool thing is that this program does not need to be told explicitly how the pieces move; it can infer it from data. Since the system is using amateur games, it sensibly enough does not care about the outcome of each game (that would be a weak label for learning). I do think that Erik is a bit optimistic when he writes that “Still, even an amateur player probably makes near-optimal moves for most time.” Most people who have analyzed their own games, or online games, with a strong engine know that amateur games are just riddled with blunders. (I remember the old Max Euwe book “Chess master vs chess amateur”, which also demonstrated this convincingly … but I digress). Still, a very impressive demonstration! I once supervised a master’s thesis where the aim was to teach a neural network to play some specific endgames, and even that was a challenge. As Erik notes in his blog post, his system needs to be tried against a “real” chess engine. It is reported to score around 33% against Sunfish, but that is a fairly weak engine, as I found out by playing it half and hour ago.