Using posterior predictive distributions to analyse epidemic models: COVID-19 in Mexico City

Scientists and public health officials have relied on a variety of epidemiological models to forecast the trajectory of the coronavirus pandemic, and to derive guidance for policies aiming to avoid overloading health facilities. All such models contain parameters, and forecasting tools tend to choose these to provide a best fit to available observations. Unfortunately, most of these models are examples of so-called “sloppy models,” for which a broad range of parameter values produce similarly acceptable fits for limited or noisy data. This is a problem, because different parameter estimates also lead to widely different predictions from the same model.
As LML External Fellow Isaac Pérez Castillo and colleagues explore in a recent paper, one way to avoid this issue would be to use Bayesian statistics to derive uncertainties for the model predictions, taking into account the full range of parameter estimates. More specifically, given the epidemiological model and a probability distribution for observations, they use the posterior distribution of model’s parameters to generate all possible epidemiological curves, thereby generating posterior predictive distributions. From these, one can extract the worst-case scenario and study the impact of implementing contingency plans according to this assessment. The authors apply this approach to the potential evolution of COVID-19 in Mexico City and assess whether contingency plans are being successful and whether the epidemiological curve has plausibly flattened.
Although the authors employ a simple model, the main conclusion of this work is that extrapolating results without accounting for sensitivity to changes in parameters can lead to highly inaccurate predictions. Much the same conclusion should hold for more detailed models, e.g., those which include specific details of the population, as these models are also sloppy. One implication is that, even with fairly accurate observational data, the stochasticity inherent at the start of an epidemic means that parameter estimates based on data from the beginning of an outbreak will be quite uncertain. Models parametrised with such data will then also carry great uncertainty in longer term forecasts. However, this uncertainty can be usefully quantified using techniques from Bayesian statistics, which provide improved estimates of worst-case scenarios.
The paper is available at

Leave a Reply

Your email address will not be published. Required fields are marked *