Monday, June 20, 2016

ISBA 2016

I got back from ISBA 2016 at the weekend, having spent a week at the picturesque Forte Village resort in Sardinia.  Last weekend also happened when UK astronaut Tim Peake returned from having spent 6 months on the International Space Station.  Although I am sure returning from space requires more adjustment than returning from an international conference, I do feel like a bit like I have returned from another planet!

I cannot pretend to give an expert's view of the conference since there were many people there with decades more experience than me.  The age distribution of the conference was heavily weighted towards young researchers (perhaps partly as a result of the generous travel support targeted towards this group).  Nevertheless the age distribution was very wide with a fair number of people there in their seventies.  One of these was Adrian Smith, who came to the first of the Valencia meetings, and gave an interesting perspective on how Bayesians have gone from being outsiders and regarded with a high degree of scepticism to being a dominant force in the world of statistical analysis.  A simple illustration of this is the numbers at the conference which have grown from around 70 to around 700 over the course of around 40 years.

One feature of the conference that has remained the same (and perhaps a key ingredient to its continuing success!?) is the cabaret, which features Bayes inspired entertainment.  The proceedings of the first Valencia meeting (which can be found here - printed the song "There's no Theorem like Bayes Theorem" to the tune of "There's no Business like Show Business" by the distinguished statistician G.E.P. Box.

I would strongly advise against searching for a YouTube rendition of Box's song.  I do not know whether Box was as good a musician as he was a lyricist (and statistician), but his followers certainly seem to have a rather deficient sense of pitch and harmony.

Here are a few reflections on the academic program of ISBA 2016.

A lot of the talks fell into one of two broad categories.  On the one hand, some talks focused on general inference problems, and the development of methodology that should be applicable to a wide range of problems in various application areas.  On the other hand, some talks focused more on a specific application area, and looked at the challenge of adapting quite general statistical ideas to specific research questions.

The presenters who I found most stimulating were Adrian Raftery on demography and Sylvia Richardson on single-cell gene expression.  These were both from the second category of talks (i.e. more oriented to a specific application), but the researchers have both also done important work on the first category (i.e. development of generally applicable statistical methodology).  For me, their work demonstrates the value of working in both areas.  They both have an impressive ability to identify problems that benefit from bayesian analysis.  In Adrian Raftery's demography work, the novel application was the quantification of uncertainty in country-specific fertility rates by pooling information across countries through a hierarchical model.  Sylvia Richardson's work on gene expression also used a hierarchical model, but in this case to quantify uncertainty in cell-specific gene expression levels, again by pooling information across cells.  The main reason the bayesian approach is so effective in these problems is the small amount of data that is available per country (in demography) or per cell (in gene expression).

Although I found some of the presentations on general methodology quite stimulating (such as Peter Green's keynote lecture on structural learning), there were quite a few presentations which I felt were not well motivated, at least not in a way that I could understand.  One area where there were quite a few presentations this year was Bayesian variable selection for linear regression models.  In that setting you assume that the data is i.i.d. and the variable selection can be that of as a kind of model uncertainty, often encoded through the choice of prior.  The reason I am somewhat sceptical about this kind of research is (i) the linear regression model may not be sufficiently complex to model the true relationships that exist between the variables in the dataset, (ii) if the linear regression model is appropriate, then the most important predictors for a given response can usually be picked up through a non-bayesian analysis such as a forward selection or a backwards elimination algorithm.  This is based on my experience of fitting and using regression models as a practitioner in operational research.

To wrap up, I am deeply grateful to all the people who make the bayesian analysis community what it is, both through their research findings, through the hard administrative labour that must go into organising large scientific meetings, and through personal warmth and encouragement.  I hope that it continues to be a vibrant community with vigorous exchanges between scientific applications and mathematical theory.