Spotify music graph meetup
Continuing on the theme of graphs from my last post, I went to Spotify’s Music Graph Tech Talk in Stockholm the other day. It turns out that Spotify has recently started a dedicated group for graph engineering, called the “graph squad” (they are currently hiring for senior graph specialist and graph software engineer roles), which is busy evaluating different options for storing and manipulating the world’s most comprehensive music graph.
It was rather fascinating to hear Jon Åslund describe the various distinctions between “tracks”, “songs” and “recordings”, “albums”, “releases” and “release groups”, and the almost-but-not-quite-perfect ISRC codes for uniquely identifying tracks.
There were two interesting presentations by people from Neo4j, one of which is located here: Graphs for bunnies. It gives an introduction to graph databases.
Anders Arpteg, the squad leader of the graph group, mentioned that they have existed for a couple of months and are still trying out things like Neo4j and Giraph for handling the music graph. He gave some (“slightly outdated”) numbers that I failed to write down, but I think he said there are something like 20 million active users and 5 Tb of event data are recorded each day. I read from another source that Spotify has the largest commercially used Hadoop cluster in Europe (700 nodes) although I don’t know if that is used for the graph processing.
All in all, it was a good event which was made even better by free hamburgers and beer.