The BigML blog has been on a roll lately with many interesting posts. I particularly liked this one, Bedtime for Boosting, which goes pretty deep into benchmarking various versions of the boosting algorithms we all know and love (?).
Mark Gerstein of Yale University has a nice slide deck about the big data blizzard in genomics (<– pdf link). There are lots of ideas here about how to build predictive models based on, for example, ENCODE data. I won’t get into the ongoing controversy around ENCODE here, suffice to say that I think the ENCODE data sets are a good resource for starting to build statistical models of genomic regulation on a larger scale.