Sunday, June 14, 2015

My Perspective on New Perspectives in MCMC

I recently got back from a summer school in Valladolid on New Perspectives in MCMC.

Here are a few thoughts on the lectures that were given.

MCMC-based integrators for SDEs (Nawaf Bou-Rabee)

Nawaf's interests are mainly in numerical solvers for Stochastic Differential Equations (SDEs) that describe diffusion and drift processes.  The first part of the course was about how standard MCMC algorithms can be seen as discretisations of SDEs.  The aim of MCMC algorithms is to sample from a probability distribution.  A valid MCMC algorithm can be thought of as an SDE discretisation that has the probability distribution of interest as its stationary distribution.

There are also many ways of discretising an SDE that do not correspond to existing MCMC algorithms, and that is what the second part of the course was about.  It made me wonder whether there is scope for designing new MCMC methods from these (or other) SDE numerical solvers?

Exact approximations of MCMC algorithms (Christophe Andrieu)

These lectures were about the pseudo-marginal approach, which I have discussed quite a bit in previous blog posts.  The key idea is to find ways of approximating the likelihood or the acceptance probability in such a way that resulting algorithm is still a valid MCMC algorithm, but is computationally much cheaper than the simple ideal algorithm (which might, for example, contain an intractable marginal likelihood term in the acceptance ratio).

The thing that I understood better than before was an idea called Rejuvenation.  (From what I understand this is the idea used in the particle Gibbs variant of particle MCMC).  Without Rejuvenation, the denominator in the acceptance ratio can be very large if the approximation of the likelihood is by chance very poor.  This means that the noisy acceptance probability can be much smaller than the acceptance probability of the ideal algorithm, and therefore the noisy algorithm can have a tendency to get stuck.  Rejuvenation is a way of revising the noisy estimates in the algorithm from one iteration to the next in a way that preserves the validity of the algorithm and makes it less likely to get stuck.

MCMC in High Dimensions (Andrew Stuart)

These lectures had a very clear message that was repeated through-out, which was that thinking in infinite dimensions results in good methodology for high dimensional problems, particularly high dimensional inverse problems and inference for diffusions.

From these lectures I got a much better idea of what it means for measures to be singular with respect to each other.  Understanding these concepts is helpful for designing valid proposals for MCMC algorithms in infinite-dimensional spaces, and leads naturally to good proposals in high-dimensional problems.  Coincidentally the concepts of singularity and equivalence are also very important in Reversible Jump methodology.  For me this illustrates how potent some mathematical ideas are for solving a wide range of problems and for drawing connections between seemingly disparate ideas.

Course website -

Monday, June 1, 2015

Making intractable problems tractable

I recently attended an i-like / SuSTaIn workshop in Bristol, organised by Christophe Andrieu.  I-like is short for Intractable Likelihoods, although the scope of the conference was a bit more general than that.  Most of the work that was presented was about making intractable problems tractable.  Often the most conceptually simple way of solving a problem is computationally intractable, i.e., it would require months / years / aeons, for current computers to solve.  Mathematics can be used to re-formulate or approximate problems in such a way that they become tractable, for example by identifying parts of the computation that do not make much difference to the final answer.  Or by identifying unnecessary computations that duplicate (or perhaps approximately duplicate) other computations.

Here is a selection of application areas that were talked about.

Foot & Mouth epidemics - intractable because of the need to model the interactions between thousands of different farms and model the stochastic dynamics of disease transmission.

High risk behaviour in HIV positive youth - challenging because of the discrepancies between reported and actual behaviour.  Intractable because there are so many different permutations of actual behaviour that underly / could explain reported behaviour.

Genetic recombination - given DNA sequences from a (current-day) population, scientists are interested in inferring how the DNA sequences have evolved / been modified over time.  Recombination is one mechanism for DNA sequence modification.  Inference in this setting is challenging because we cannot observe the past (i.e. ancestral DNA sequences).  There is a connection with the HIV example above: the problem is intractable because there are so many different permutations of past DNA sequences that could have generated the DNA sequences of the current population.

Testing the predictions of general relativity using massive detectors - this reminded me of the little that I know about the Large Hadron Collider experiments  However, in this case the detectors pick up natural signals coming from space, and the physics is completely different.  The general relativity detector problem is intractable because of the huge volume of data generated by the detector.  And also because generating predictions using the general relativity equations is computationally expensive.  (As an aside, it is quite awe-insipring to think about the forces at work on such a large scale in our universe.)

Predicting driving times - the simplest way of predicting driving times is just to use speed limits and assume that there is little of no traffic.  Anyone who has ever tried to get anywhere near a population center in the UK on a Friday evening will know that this does not produce very accurate driving time predictions.  Companies like Microsoft, Google, and Apple collect GPS data that is essentially a set of positions associated with time-stamps.  From this it is straight-forward to work out how long people's past journeys have taken.  However producing predictions of an arbitrary future journey is challenging because many future journeys will not follow an identical route to past journeys.

There were also a couple of presentations that focused more on methodology that could be applied in a wide range of different settings.

Exact-approximate MCMC - this is something I have talked about in previous blog posts and also goes under the name of the pseudo-marginal approach.  The basic question that motivates this work is, 'When is it ok to approximate the Metropolis-Hastings acceptance probability?'  There are lots of settings where it is possible to obtain a tractable approximation to the acceptance probability (State State Models, Markov Random Fields, Spatial Processes, Trans-dimensional problems).  Being able to use the tractable approximation makes it possible to do a lot more statistical inference for problems of this type.

Deep neural networks - this is something that is currently very popular in machine-learning / computer science / artificial intelligence.  There the interest is in training computers to produce predictions from data.  This is sometimes referred to as black-box prediction, because the way in which the computer produces the prediction does not necessarily give you any insight into the underlying process thats connects the x variables (determinants) with the y variable(s) (response).  From my perspective it does seem a lot like regression with more flexibility and automation in the transformations that are applied to the data.  Increasing flexibility means (at least to me) adding in more parameters.  This makes it more challenging to ensure that the algorithms are computationally tractable.

For researcher names and presentation / poster abstracts, see