Follow the Data

A data driven blog

Archive for the tag “infomap”

Dynamics in Swedish Twitter communities


I made a community decomposition of Swedish Twitter accounts in 2015 and 2016 and you can explore it in an online app.


As reported on this blog a couple of months ago, (and also here). I have (together with Mattias Östmar) been investigating the community structure of Swedish Twitter users. The analysis we posted then addressed data from 2015 and we basically just wanted to get a handle on what kind of information you can get from this type of analysis.

With the processing pipeline already set up, it was straightforward to repeat the analysis for the fresh data from 2016 as soon as Mattias had finished collecting it. The nice thing about having data from two different years in that we can start to look at the dynamics – namely, how stable communities are, which communities are born or disappear, and how people move between them.

The app

First of all, I made an app for exploring these data. If you are interested in this topic, please help me understand the communities that we have detected by using the “Suggest topic” textbox under the “Community info” tab. That is an attempt to crowdsource the “annotation” of these communities. The suggestions that are submitted are saved in a text file which I will review from time to time and update the community descriptions accordingly.

The fastest climbers

By looking at the data in the app, we can find out some pretty interesting things. For instance, the account that easily increased to most in influence (measured in PageRank) was @BjorklundVictor, who climbed from a rank of 3673 in 2015 in community #4 (which we choose to annotate as an “immigration” community) to a rank of 3 (!) in community #4 in 2016 (this community has also been classified as an immigration-discussion community, and it is the most similar one of all 2016 communities to the 2015 immigration community.) I am not personally familiar with this account, but he must have done something to radically increase his reach in 2016.

Some other people/accounts that increased a lot in influence were professor Agnes Wold (@AgnesWold) who climbed from rank 59 to rank 3 in the biggest community, which we call the “pundit cluster” (it has ID 1 both in 2015 and 2016), @staffanlandin, who went from #189 to #16 in the same community, and @PssiP, who climbed from rank 135 to rank 8 in the defense/prepping community (ID 16 in 2015, ID 9 in 2016).

Some people have jumped to a different community and improved their rank in that way, like @hanifbali, who went from #20 in community 1 (general punditry) in 2015 to the top spot, #1 in the immigration cluster (ID 4) in 2016, and @fleijerstam, who went from #200 in the pundit community in 2015 to #10 in the politics community (#3) in 2016.

Examples of users who lost a lot of ground in their own community are @asaromson (Åsa Romson, the ex-leader of the Green Party; #7 -> #241 in the green community) and @rogsahl (#10 -> #905 in the immigration community).

The most stable communities

It turned out that the most stable communities (i.e. the communities that had the most members in common relative to their total sizes in 2015 and 2016 respectively) were the ones containing accounts using a different language from Swedish, namely the Norwegian, Danish and Finnish communities.

The least stable community

Among the larger communities in 2015, we identified the one that was furthest from having a close equivalent in 2016. This was 2015 community 9, where the most influential account was @thefooomusic. This is a boy band whose popularity arguably hit a peak in 2015. The community closest to it in 2016 is community 24, but when we looked closer at that (which you can also do in the app!), we found that many YouTube stars had “migrated” into 2016 cluster 24 from 2015 cluster 84, which upon inspection turned out to be a very clear Swedish YouTuber cluster with stars such as Clara Henry, William Spetz and Therese Lindgren.

So in other words, the The Fooo fan cluster and the YouTuber cluster from 2015 merged into a mixed cluster in 2016.

New communities

We were hoping to see some completely new communities appear in 2016, but that did not really happen, at least not for the top 100 communities. Granted, there was one that had an extremely low similarity to any 2015 community, but that turned out to be a “community” topped by @SJ_AB, a railway company that replies to a large number of customer queries and complaints on Twitter (which, by the way, makes it the top account of them all in terms of centrality.) Because this company is responding to queries from new people all the time, it’s not really part of a “community” as such, and the composition of the cluster will naturally change a lot from year to year.

Community 24, which was discussed above, was also dissimilar from all the 2015 communitites, but as described, we notice it has absorbed users from 2015 clusters 9 (The Fooo) and 84 (YouTubers).

Movement between the largest communities

The similarity score for the “pundit clusters” (community 1 in 2015 and community 1 in 2016, respectively) somewhat surprisingly showed that these were not very similar overall, although many of the top-ranked users are the same. A quick inspection also showed that the entire top list of community 3 in 2015 moved to community 1 in 2016, which makes the 2015 community 3 the closest equivalent to the 2016 community 1. Both of these communities can be characterized as general political discussion/punditry clusters.

Comparison: The defense/prepper community in 2015 vs 2016

In our previous blog post on this topic, we presented a top-10 list of defense Twitterers and compared that to a manually curated list from Swedish daily Svenska Dagbladet. Here we will present our top-10 list for 2016.

Username Rank in 2016 Rank in 2015 Community ID in 2016 Community ID in 2015
patrikoksanen 1 3 9 16
hallonsa 2 5 9 16
Cornubot 3 1 9 16
waterconflict 4 6 9 16
wisemanswisdoms 5 2 9 16
JohanneH 6 9 9 16
mikaelgrev 7 7 9 16
PssiP 8 135 9 16
oplatsen 9 11 9 16
stakmaskin 10 31 9 16

Comparison: The green community in 2015 vs 2016

One community we did not touch on in the last blog post is the green, environmental community. Here’s a comparison of the main influencers in that category in 2016 vs 2015.

Username Rank in 2016 Rank in 2015 Community ID in 2016 Community ID in 2015
rickardnordin 1 4 13 29
Ekobonden 2 1 13 109
ParHolmgren 3 19 13 29
BjornFerry 4 12 13 133
PWallenberg 5 12 13 109
mattiasgoldmann 6 3 13 29
JKuylenstierna 7 10 13 29
Axdorff 8 3 13 153
fores_sverige 9 11 13 29
GnestaEmma 10 17 13 29


Of course, many parts of this analysis could be improved and there are some important caveats. For example, the Infomap algorithm is not deterministic, which means that you are likely to get somewhat different results each time you run it. For these data, we have run it a number of times and seen that you get results that are similar in a general sense each time (in terms of community sizes, top influencers and so on), but it should be understood that some accounts (even top influencers) can in some cases move around between communities just because of this non-deterministic aspect of the algorithm.

Also, it is possible that the way we use to measure community similarity (the Jaccard index, which is the ratio between the number of members in common between two communities and the number of members that are in any or both of the communities – or to put it in another way, the intersection divided by the union) is too coarse, because it does not consider the influence of individual users.

Finding communities in the Swedish Twitterverse with a mention graph approach

Mattias Östmar and me have published an analysis of the “big picture” of discourse in the Swedish Twitterverse that we have been working on for a while, on and off. Mattias hatched the idea to take a different perspective from looking at keywords or numbers of followers or tweets, and instead try to focus on engagement and interaction by looking at reciprocal mention graphs – graphs where two users get a link between them if both have mentioned each other at least once (as happens by default when you reply to a tweet, for example.) He then applied an eigenvector centrality measure to that network and was able to measure the influence of each user in that way (described in Swedish here).

In the present analysis we went further and tried to identify communities in the mention network by clustering the graph. After trying some different methods we eventually went with Infomap, a very general information-theory based method (it handles both directed and undirected, weighted and unweighted networks, and can do multi-level decompositions) that seems to work well for this purpose. Infomap not only detects clusters but also ranks each user by a PageRank measure so that the centrality score comes for free.

We immediately recognized from scanning the top accounts in each cluster that there seemed to be definite themes to the clusters. The easiest to pick out were Norwegian and Finnish clusters where most of the tweets were in those languages (but some were in Swedish, which had caused those accounts to be flagged as “Swedish”.) But it was also possible to see (at this point still by recognizing names of famous accounts) that there were communities that seemed to be about national defence or the state of Swedish schools, for instance. This was quite satisfying as we hadn’t used the actual contents of the tweets – no keywords or key phrases – just the connectivity of the network!

Still, knowing about famous accounts can only take us so far, so we did a relatively simple language analysis of the top 20 communities by size. We took all the tweets from all users in those communities, built a corpus of words of those, and calculated the TF-IDFs for each word in each community. In this way, we were able to identify words that were over-represented in a community with respect to the other communities.

The words that feel out of this analysis were in many cases very descriptive of the communities, and apart from the school and defence clusters we quickly identified an immigration-critical cluster, a cluster about stock trading, a sports cluster, a cluster about the boy band The Fooo Conspiracy, and many others. (In fact, we have since discovered that there are a lot of interesting and thematically very specific clusters beyond the top 20 which we are eager to explore!)

As detailed in the analysis blog post, the list of top ranked accounts in our defence community was very close to a curated list of important defence Twitter accounts recently published by a major Swedish daily. This probably means that we can identify the most important Swedish tweeps for many different topics without manual curation.

This work was done on tweets from 2015, but in mid-January we will repeat the analysis on 2016 data.

There is some code describing what we did on GitHub.


Post Navigation