## Compressive sensing and bioinformatics

Compressive (or compressed) sensing is (as far as I can tell; I have learned everything I know about it from the excellent Nuit Blanche blog) a relatively newly named research subject somewhere around the intersection of signal processing, linear algebra and statistical learning. Roughly speaking, it deals with problems like how to reconstruct a signal from the smallest possible number of measurements and how to complete a matrix (to fill in missing matrix elements – if you think that sounds kind of abstract, thisĀ presentation on collaborative filtering by Erik Bernhardssons nicely explains how Spotify uses matrix completion for song recommendations; NetFlix and others also do similar things). The famous mathematician Terence Tao, who has done a lot of work in developing compressed sensing, has much better explanations than I can give, for instance this 2007 blog post that exemplifies the theory with the “single-pixel camera” and this PDF presentation where he explains how CS relates to standard linear (*Ax = b*) equation systems that are under-determined (more variables than measurements). Igor Carron of the Nuit Blanche blog also has a “living document” with *a lot* of explanations about what CS is about (probably more than you will want to refer to in a single sitting).

Essentially, compressed sensing works well when the signal is sparse in some basis (this statement will sound much clearer if you have read one of the explanations I linked above). The focus on under-determined equation systems made me think about whether this could be useful for bioinformatics, where we frequently encounter the case where we have few measurements of a lot of variables (think of, e.g., the difficulty of obtaining patient samples, and the large number of gene expression measurements that are taken from them). The question is, though, whether gene expression vectors (for example) can be thought of as *sparse* in some sense. Another train of thought I had is that it would be good to develop even more approximate and, yes, compressive methods for things like metagenomic sequencing, where the sheer amount of information pretty quickly starts to break the available software. (C Titus Brown is one researcher who is developing software tools to this end.)

Of course, I was far from being the first one to make the connection between compressive sensing and bioinformatics. Olgica Milenkovic had thoughtfully provided a presentation on sparse problems in bioinformatics (problems that thus could be addressable with CS techniques).

Apart from the applications outlined in the above presentation, I was excited to see a new paper about a CS approach to metagenomics:

Quikr – A method for rapid reconstruction of bacterial communities using compressed sensing

Also there are a couple of interesting earlier publications:

A computational model for compressed sensing RNAi cellular screening

Compressive sensing DNA microarrays