Last week I went to a conference on ABC (Approximate Bayesian Computation) that was held on the Silja Symphony cruise ship between Helsinki and Stockholm. Thank you to Michael Gutmann and the other organisers for the opportunity to go on an exotic nordic odyssey with a stimulating scientific programme!
I came to the conference as something of an outsider, as I have not used ABC in my research so far. The main reasons I am interested in the area is that my supervisor Richard has done some work on ABC, with applications mainly focused on population genetics. And a lot of people who work with differential equation models are attracted to it.
I was interested in answering the following questions. When is ABC the preferred method compared to other statistical methods? And what situations is the use of ABC is to be avoided?
The short answer to the first question is that ABC is used when the likelihood of the model is not available. This may be because the likelihood is an intractable integral. For example, in population genetics the likelihood can only be obtained by integrating over the space of phylogenetic trees. As an aside, there is an alternative method for the population genetics example, which is the pseudo-marginal method. It is not clear to me whether ABC or pseudo-marginal methods are preferable for this example.
As well as situations where the likelihood is intractable (another example of this is the transition density in nonlinear SDEs), there may be situations where it is not possible to write down an expression for the likelihood at all. I have not encountered any situations like this myself, although I am told that a lot of agent-based or rule-based models fall into this category. In Reading there are people using these kind of models to describe population dynamics of earthworms and the interaction with pesticides.
Several of the conference talks (Richard Everitt, Richard Wilkinson and in an indirect way, Michael Gutman) focused primarily on the use of ABC in expensive simulators, such as climate simulators or epidemiology models. These talks mainly confirmed my view prior to the conference that expensive simulators should be avoided wherever possible because there are severe limitations on what you can do for parameter inference.
On a more positive note, there were examples in the talks where ABC was made tractable in models with 1 or 2 unknown parameters by combining ABC with other methods such as bootstrapping (RE) and Gaussian Processes (RW and MG). The basic idea, is that if we knew the probability distribution of the model's summary statistics as a function of the unknown parameters, we would be done. Tools like bootstrapping and Gaussian Processes can be used to estimate the summary statistics' probability distribution. They can also quantify uncertainty in estimators. This is not ideal since the relatively small sample size for these estimators means that the error in the estimators may be quite large and difficult to quantify accurately. However if you are only interested in classifying parameter sets as plausible vs non-plausible, or you only need estimates of mean parameter values, you may not need that many samples.
It is unclear to me what you would do when you have an expensive simulator with more than 2 unknown parameters. I am not sure that the methods presented in the conference would work that well in that setting, without (i) a lot of computing power, and/or (ii) strong assumptions on the probability distribution of the summary statistics as a function of the parameters.
One area that was not covered at the conference but that I would like to know more about is what to do when you have a deterministic simulator, like an ODE model. I have come across this situation in the literature, for example in the work of Michael Stumpf's group, where ABC has been used.
Suppose that a deterministic simulator is a perfect model of the process we were interested. Then repeated samples from the true process should yield exactly the same data. Furthermore, we should be able to identify a parameter set that perfectly reproduces the experimental results. (Or multiple parameter sets if the model is non-identifiable.) In practice, a far more common situation is that repeated samples from the true process give varied results even though the process model is deterministic. With a deterministic simulator there is no way of accounting for this variability. Two options in this situation are (i) introduce some stochasticity into the model, for example observation noise, (ii) define ranges for the summary statistics so that parameter values are accepted if the deterministic simulation results in summary statistics that fall within the pre-specified ranges.
If we choose option (i), I would then suggest using a likelihood based approach, and if that didn't work, then trying ABC. If we choose option (ii), this fits within the ABC framework. However in the standard ABC framework, the effect of reducing the tolerance on the posterior should diminish as the tolerance goes to 0. I.e. for small tolerances, dividing the tolerance by 2 should have a very small effect on the credible regions for the parameters. If you have a deterministic simulator, the effect of reducing the tolerance will not diminish. I haven't got all this worked out rigorously, but intuitively it seems like dividing the tolerance by 2 will always decrease uncertainty by a factor of 2. Furthermore the limit (tolerance = 0) is singular with respect to all non-zero tolerance values.
A quick google scholar search shows that Richard Wilkinson has thought about these issues. The gist of what he is saying seems to be that rather than treating the tolerance as a parameter that controls the accuracy of the approximation, it can be thought of as a parameter in a probabilistic model for the model error. If anyone knows of any other work that discusses these issues, please let me know! As you can see I am still rather a long way from being able to provide practical guidance on the use of ABC.
I came to the conference as something of an outsider, as I have not used ABC in my research so far. The main reasons I am interested in the area is that my supervisor Richard has done some work on ABC, with applications mainly focused on population genetics. And a lot of people who work with differential equation models are attracted to it.
I was interested in answering the following questions. When is ABC the preferred method compared to other statistical methods? And what situations is the use of ABC is to be avoided?
The short answer to the first question is that ABC is used when the likelihood of the model is not available. This may be because the likelihood is an intractable integral. For example, in population genetics the likelihood can only be obtained by integrating over the space of phylogenetic trees. As an aside, there is an alternative method for the population genetics example, which is the pseudo-marginal method. It is not clear to me whether ABC or pseudo-marginal methods are preferable for this example.
As well as situations where the likelihood is intractable (another example of this is the transition density in nonlinear SDEs), there may be situations where it is not possible to write down an expression for the likelihood at all. I have not encountered any situations like this myself, although I am told that a lot of agent-based or rule-based models fall into this category. In Reading there are people using these kind of models to describe population dynamics of earthworms and the interaction with pesticides.
Several of the conference talks (Richard Everitt, Richard Wilkinson and in an indirect way, Michael Gutman) focused primarily on the use of ABC in expensive simulators, such as climate simulators or epidemiology models. These talks mainly confirmed my view prior to the conference that expensive simulators should be avoided wherever possible because there are severe limitations on what you can do for parameter inference.
On a more positive note, there were examples in the talks where ABC was made tractable in models with 1 or 2 unknown parameters by combining ABC with other methods such as bootstrapping (RE) and Gaussian Processes (RW and MG). The basic idea, is that if we knew the probability distribution of the model's summary statistics as a function of the unknown parameters, we would be done. Tools like bootstrapping and Gaussian Processes can be used to estimate the summary statistics' probability distribution. They can also quantify uncertainty in estimators. This is not ideal since the relatively small sample size for these estimators means that the error in the estimators may be quite large and difficult to quantify accurately. However if you are only interested in classifying parameter sets as plausible vs non-plausible, or you only need estimates of mean parameter values, you may not need that many samples.
It is unclear to me what you would do when you have an expensive simulator with more than 2 unknown parameters. I am not sure that the methods presented in the conference would work that well in that setting, without (i) a lot of computing power, and/or (ii) strong assumptions on the probability distribution of the summary statistics as a function of the parameters.
One area that was not covered at the conference but that I would like to know more about is what to do when you have a deterministic simulator, like an ODE model. I have come across this situation in the literature, for example in the work of Michael Stumpf's group, where ABC has been used.
Suppose that a deterministic simulator is a perfect model of the process we were interested. Then repeated samples from the true process should yield exactly the same data. Furthermore, we should be able to identify a parameter set that perfectly reproduces the experimental results. (Or multiple parameter sets if the model is non-identifiable.) In practice, a far more common situation is that repeated samples from the true process give varied results even though the process model is deterministic. With a deterministic simulator there is no way of accounting for this variability. Two options in this situation are (i) introduce some stochasticity into the model, for example observation noise, (ii) define ranges for the summary statistics so that parameter values are accepted if the deterministic simulation results in summary statistics that fall within the pre-specified ranges.
If we choose option (i), I would then suggest using a likelihood based approach, and if that didn't work, then trying ABC. If we choose option (ii), this fits within the ABC framework. However in the standard ABC framework, the effect of reducing the tolerance on the posterior should diminish as the tolerance goes to 0. I.e. for small tolerances, dividing the tolerance by 2 should have a very small effect on the credible regions for the parameters. If you have a deterministic simulator, the effect of reducing the tolerance will not diminish. I haven't got all this worked out rigorously, but intuitively it seems like dividing the tolerance by 2 will always decrease uncertainty by a factor of 2. Furthermore the limit (tolerance = 0) is singular with respect to all non-zero tolerance values.
A quick google scholar search shows that Richard Wilkinson has thought about these issues. The gist of what he is saying seems to be that rather than treating the tolerance as a parameter that controls the accuracy of the approximation, it can be thought of as a parameter in a probabilistic model for the model error. If anyone knows of any other work that discusses these issues, please let me know! As you can see I am still rather a long way from being able to provide practical guidance on the use of ABC.