A couple of years ago, I participated in a workshop on academic data science at SICS in Stockholm. At that event, we discussed various trends in data science and machine learning and at the end of it, I participated in a discussion group, led by professor Niklas Lavesson from Blekinge Institute of Technology, where we talked about model interpretability and explanation. At the time, it felt like a fringe but interesting topic. Today, this topic seems to be all over the place. Here are some of the places I’ve seen it recently.
Blog posts and presentations
Ideas on interpreting machine learning. This is a very thorough blog post from O’Reilly with a lot of good ideas. It also talks about related things such as dimensionality reduction which I would not call model explanation per se, but which are still good to know.
Fast Forward Labs have announced a new report on interpretable machine learning. (I have not read the actual report.)
Papers with software
Understanding Black-box Predictions via Influence Functions. The paper of this name (associated code here) won a best-paper award at ICML 2017 (again showing how hot this topic is!). The authors use something called an influence function to quantify, roughly speaking, how much a perturbation of a single example in the training data set affects the resulting model. In this way, they can identify the training data points most responsible for a given prediction. One might say that they have figured out a way to differentiate a predictive model with respect to data points in the training set.
LIME, Local Interpretable Model-agnostic Explanations. (arXiv link, code on Github) This has been around for more than a year and can thus be called “established” in the rapidly changing world of machine learning. I have tried it myself for a consulting gig and found it useful for understanding why a certain prediction was made. The main implementation is in Python but there is also a good R port (which is what I used when I tried it.) LIME essentially builds a simplified local model around the data point you are interested in. It does this by perturbing real training data points, obtaining the predicted label for those perturbed points, and fitting a sparse linear model to those points and labels. (As far as I have understood, that is!)
I’m sure I have missed a lot of interesting work.
If anyone is interested, I might write another blog post illustrating how LIME can be used to understand why a certain prediction was made on a public dataset. I might even try to explain the influence function paper if I get the time to try it and digest the math.