(Short version: explore region-specific gene expression in two human brains at https://mikaelhuss.shinyapps.io/ExploreAllenBrainRNASeq/)
The Allen Institute for Brain Science has done a tremendous amount of work to digitalize and make available information on gene expression at a fine-grained level both in the mouse brain and the human brain. The Allen Brain Atlas contains a lot of useful information on the developing brain in mouse and human, the aging brain, etc. – both via GUIs and an API.
Among other things, the Allen institute has published gene expression data for healthy human brains divided by brain structure, assessed using both microarrays and RNA sequencing. In the RNA-seq case (which I have been looking at for reasons outlined below), two brains have been sectioned into 121 different parts, each representing one of many anatomical structures. This gives “region-specific” expression data which are quite useful for other researchers who want to compare their brain gene expression experiments to publicly available reference data. Note that each of the defined regions will still be a mix of cell types (various kinds of neuron, astrocytes, oligodendrocytes etc.), so we are still looking at a mix of cell types here, although resolved into brain regions. (Update 2016-07-22: The recently released R package ABAEnrichment seems very useful for a more programmatic approach than the one described here to accessing information about brain structure and cell type specific genes in Allen Brain Atlas data!)
As I have been working on a few projects concerning gene expression in the brain in some specific disease states, there has been a need to compare our own data to “control brains” which are not (to our knowledge) affected by any disease. In one of the projects, it has also been of interest to compare gene expression profiles to expression patterns in specific brain regions. As these projects both used RNA sequencing as their method of quantifying gene (or transcript) expression, I decided to take a closer look at the Allen Institute brain RNA-seq data and eventually ended up writing a small interactive app which is currently hosted at https://mikaelhuss.shinyapps.io/ExploreAllenBrainRNASeq/ (as well as a back-up location available on request if that one doesn’t work.)
The primary functions of the app are the following:
(1) To show lists of the most significantly up-regulated genes in each brain structure (genes that are significantly more expressed in that structure than in others, on average). These lists are shown in the upper left corner, and a drop-down menu below the list marked “Main structure” is used to select the structure of interest. As there are data from two brains, the expression level is shown separately for these in units of TPM (transcripts per million). Apart from the columns showing the TPM for each sampled brain (A and B, respectively), there is a column showing the mean expression of the gene across all brain structures, and across both brains.
(2) To show box plots comparing the distribution of the TPM expression levels in the structure of interest (the one selected in the “Main structure” drop-down menu) with the TP distribution in other structures. This can be done on the level of one of the brains or both. You might wonder why there is a “distribution” of expression values in a structure. The reason is simply that there are many samples (biopsies) from the same structure.
So one simple usage scenario would be to select a structure in the drop-down menu, say “Striatum”, and press the “Show top genes” button. This would render a list of genes topped by PCP4, which has a mean TPM of >4,300 in brain A and >2,000 in brain B, but just ~500 on average in all regions. Now you could select PCP4, copy and paste it into the “gene” textbox and click “Show gene expression across regions.” This should render a (ggplot2) box plot partitioned by brain donor.
There is another slightly less useful functionality:
(3) The lower part of the screen is occupied by a principal component plot of all of the samples colored by brain structure (whereas the donor’s identity is indicated by the shape of the plotting character.) The reason I say it’s not so useful is that it’s currently hard-coded to show principal components 1 and 2, while I ponder where I should put drop-down menus or similar allowing selection of arbitrary components.
The PCA plot clearly shows that most of the brain structures are similar in their expression profiles, apart from the structures: cerebral cortex, globus pallidus and striatum, which form their own clusters that consist of samples from both donors. In other words, the gene expression profiles for these structures are distinct enough not to get overshadowed by batch or donor effects and other confounders.
I hope that someone will find this little app useful!