Follow the Data

A data driven blog

Archive for the tag “social-networks”

Not contagious after all?

(via Decision Science News) Ouch! A new paper titled “The Spread of Evidence-Poor Medicine via Flawed Social-Network Analysis” (published here and available in manuscript format on arXiv) has come out arguing very strongly against the conclusions drawn by Christakis and Fowler in a series of papers where they put forward the idea that things like obesity and smoking can be transmitted through social networks; a kind of “social contagion.” I blogged about these ideas a while back after both Wired and the New York Times had published articles on them. The title (harsh!) and the abstract speaks for itself:

The chronic widespread misuse of statistics is usually inadvertent, not intentional. We find cautionary examples in a series of recent papers by Christakis and Fowler that advance statistical arguments for the transmission via social networks of various personal characteristics, including obesity, smoking cessation, happiness, and loneliness. Those papers also assert that such influence extends to three degrees of separation in social networks. We shall show that these conclusions do not follow from Christakis and Fowler’s statistical analyses. In fact, their studies even provide some evidence against the existence of such transmission. The errors that we expose arose, in part, because the assumptions behind the statistical procedures used were insufficiently examined, not only by the authors, but also by the reviewers. Our examples are instructive because the practitioners are highly reputed, their results have received enormous popular attention, and the journals that published their studies are among the most respected in the world. An educational bonus emerges from the difficulty we report in getting our critique published. We discuss the relevance of this episode to understanding statistical literacy and the role of scientific review, as well as to reforming statistics education.

Cosma Shalizi has co-authored another paper (available here) which makes a similar point in a much more, let’s say, polite way. My impression is that Shalizi is both sharp and trustworthy (I’ve learned a lot about statistics from his blog) so I’m inclined to think he is on to something.

Network medicine startups

There are two (well, I’m sure there are really more) interesting new startups that combine medicine with networks, albeit in different ways. NuMedii (which appears to be shorthand for New Indications of Medicines) uses a data-driven approach to discover new indications for previously existing drugs. This is potentially very useful because existing drugs have gone through rigorous tests for toxicity etc. and are therefore easier to bring to the market rather than developing a drug from scratch. NuMedii’s technology is based on academic work from Stanford and they have a killer team that includes the likes of Atul Butte and Eric Schadt. The company is currently looking for what is essentially a bioinformatics-slanted big data scientist; one of the responsibilities related to this position is to “Architect, develop, maintain, and document a computational infrastructure that efficiently executes complex queries across many terabytes (potentially petabytes!) of disparate data and knowledge on genomics, genetics, pharmaceuticals, and chemicals.” Petabytes!

MedNetworks is also interesting, though a bit different. Its technology is based on the well-publicized work of Nicholas Christakis and colleagues at Harvard about how things like smoking and obesity appear to spread in social networks in an almost contagious way. (As an aside, I saw a random hipster at a Stockholm café sporting a copy of Christakis’ and Fowler’s book Connected: The Surprising Power of Out Social Networks – maybe network science is belatedly going mainstream here too!) MedNetworks studies things like how prescriptions of drugs are affected by the structures of social networks of physicians and patients. They attempt to identify “high influencers” in social networks, which is not necessarily the same as highly connected people. These high influencers have a strong influence on how drug prescribing behavior “diffuses” in a social network. Quoting the company website: “Optimized targeting for promotion based on social network influence provides a more efficient and effective approach to both personal and non-personal promotion.”


I was discussing the importance of data visualization with a co-worker a couple of weeks ago. We agreed that some sort of dynamic, intuitive interfaces for looking at and interacting with huge data sets in general, and sequencing-based data sets in particular, would be extremely useful. As the Dataspora blog puts it in a recent post, “The ultimate end-point for most data analysis is a human decision-maker, whose highest bandwidth channel is his or her eyeballs.” (the post is worth reading in its entirety)

Apparently Illumina (one the biggest vendors of high-throughput sequencers) agree; they’ve announced a competition where the aim is to provide useful visualizations of a number of genomic datasets derived from a breast cancer cell line. The competition closes at March 15, 2011.

Here’s a nice paper, A Tour through the Visualization Zoo, which provides a whirlwind tour of different kinds of graphs. The figures are actually interactive, so you can mess around with them if you are reading the article online.

The Infosthetics blog highlights Patients Like Me as the most successful marriage of online social media and data visualization.

Everything is contagious

I’ve been putting off writing a lengthy blog post on this topic for a while, but today I found that both the New York Times and Wired have new articles out on the same subject (see below), so I might as well point to them while at the same time offering some of my half-baked thoughts.

A couple of weeks back, I was listening to a podcast from the SmartData Collective podcast series where a guy named Korhan Yunak talked about predicting if and when a customer would cancel their mobile phone subscription and switch to another provider. All kinds of demographic information, behavioural data and other things have been used to try to extract features that predict such switches. Yunak explained that recent research had found that essentially, subscription switches propagate through social networks. What does that mean exactly?

Phone companies can construct a customer network by collecting “connections” between customers (for example, by linking everyone that has called or texted each other). By simply looking at a customer’s network neighborhood – their direct connections (often friends) and perhaps the friends of friends – the companies can get a huge boost in their predictive accuracy  (I’ve forgotten the exact number and metric, but it was a major improvement) .

Now, it is not surprising in itself that people talk to each other and influence each other in different ways, but it was surprising to me that the effect was so strong. It made me think of earlier published work which showed that obesity, happiness and smoking are all “socially contagious” in the sense that they seem to spread through social networks.

As I mentioned above, there is a new Wired article by Jonah Lehrer which talks about these things and has nice visualizations of them as well. There’s also a New York Times article on the same theme by Clive Thompson, but I haven’t read it because of the paywall.

These findings, of course, suggest a new kind of “network marketing” (the “old kind” also goes by the name of multi-level marketing). The idea is that you can use information about a customer’s friends’ preferences and shopping behavior to construct more precise targeted ads and other marketing strategies. Companies based around such ideas include Media6°, which “…connects a brand’s existing customers with user segments composed entirely of consumers who are interwoven via the social graph.” Another company, 33Across, “…uses previously untapped social data sources, in combination with advanced social network algorithms, to create unique and scalable audience segments.” Both companies do this by capturing data from social network sites on the web, according to this article.

Post Navigation