Since the outbreak of the coronavirus, modellers from around the world have scrambled to predict the course of its spread, projecting forward from data on past infections and deaths in any region. Such estimates help public health officials and governments to judge the possible risks they face, and, on their basis, many nations have imposed restrictive social confinement measures. Yet models and projections of all kinds are prone to errors, especially when they rest on highly uncertain data. And the data on the epidemic spread is extremely noisy, as reporting standards vary between the many regions around the world.
Just how uncertain are the resulting projections? This issue is the focus of a new paper published in the journal Chaos by a team of LML External Fellows – Davide Faranda, Isaac Pérez Castillo, Oliver Hulme, Jeroen Lamb, Yuzuru Sato and Erica Thompson – working with Aglaé Jezequel of the École Polytechnique in Paris. Their analysis eschews complex many-parameter models, which would likely over-fit the data, in favour of simpler conceptual models of epidemiology which more clearly reveal the general influence of parameter or data uncertainty on the accuracy of long-term projections. Their aim isn’t actually to predict the future of the epidemic – which obviously depends strongly on the social response to it – but to study the accuracy of projections made on the assumption of no social response, aiming to clarify the worst-case outcome.
They find that forecasts of long-term outcomes are almost certainly highly sensitive to variations in the reported data for the epidemic. An error of just 20% in the numbers for a single day can change long-term projections by several orders of magnitude. Hence, they counsel, projections of epidemic progression should be treated with caution. Best practice should aim to produce an ensemble of projections based on analysis of the natural sensitivity of the projection method to data inaccuracies.
The simplest way to model the progression of an epidemic is through statistical curve-fitting. Classical epidemiology observes that the function C(t) for the cumulative number of infections up to time t follows a standard curve, well approximated by a logistic function. The number of infections at first follows a phase of exponential growth, as the infectious agent spreads through the fully susceptible population. Later, when a significant fraction is already infected – reducing the population open to new infections – the infection rate tapers off and the number of infections approaches a final maximum value. Predictions of an epidemic trajectory can be made by fitting this curve to early data, and the using its later values to make forecasts.
Using this statistical approach, Faranda and colleagues first studied the sensitivity of such extrapolations using data from the epidemic in France, where the exponential growth phase started at the beginning of March 2020. From the data for the period 4-20 March, they made projections for the eventual total number of infections and studied the sensitivity of these projections to small modelling changes of two different kinds. They first looked at how the projections changed with variation of the beginning date of the fitted interval – 4, 5, 7 or 10 March – finding that these changes made almost no difference at all to the final projections. In striking contrast, projections changed markedly under variation of the ending date of the fitted interval – 17, 18, 19 or 20 March. In the latter case, changing the end date of the fit altered the projected final number by a factor of roughly 100. Making one projection on Tuesday 17 March, and another on Wednesday 18 March, changed the outlook from a gloomy 2 million eventual infections to a more palatable 30 thousand.
Such a change is clearly not meaningful, and says more about the limits of the projection task than anything else. The implication is that statistical projections are extremely sensitive to tiny errors in the final data point used in the data fitting. To bring this out in a more systematic way, the researchers used data for the epidemics in the UK, France and Italy up to 20 March to study how projections changed under a random variation of 20% in the precise number of infections on the final day, 20 March. Studying an ensemble of 100 such projections, the final number of infections varied widely, especially in the UK, which was in a somewhat earlier phase than either France or Italy. For the UK, a 20% variation in the final number led to a spread in the final projections of fully six orders of magnitude (see figure below). In a parallel analysis, the authors also studied the sensitivity of dynamical models of an epidemic to variations in key underlying parameters. One popular class of models – so-called Susceptible-Exposed-Infected-Recovered (SEIR) models – divide a population into distinct groups by their health infection status, and model how people move from one group to another as the epidemic develops. These groups are Susceptible, Exposed, Infected and Recovered individuals. Susceptible individuals can become infected, and eventually will be, and then will either recover or die. Exposed individuals have had contact with an infected person, but are not yet themselves infectious. The recovered or died are forever immune. The model also supposes that all individuals are equally susceptible, and that new exposures at time t take place at a rate proportional to I(t)·S(t), which assumes a full mixing of the population.
Most studies of these models keep key parameters – for the incubation period of the infection, the infection rate and the recovery rate – fixed. But this is unrealistic, as the parameters vary due to many effects including the influence of confinement measures, or variation in the nature of the infectious agent. Moreover, an estimate of such parameters from epidemic data will carry errors due to the way data are reported or collected. To see how these parameter variations would affect the dynamical progression of an epidemic in the model, the authors replaced the parameters with stochastic variables taking an average value with Gaussian fluctuations of about 20% around the average. In a series of simulations, they observed several interesting features.
First, fluctuations in some parameters – such as the infection rate – have a much more pronounced affect on the model outcome than do others, such as the incubation period of the infection. Second, variations in the parameters lead to an erratic fluctuation in variables such as the daily number of new infections, much like the actual record of infections seen in many nations during the COVID-19 epidemic. Of particular interest, the researchers note, is that the eventual trajectory of the epidemic is especially sensitive to parameter fluctuations when the current number of infections I(t) is high. This implies, in general, that mitigation strategies such as self-isolation or social distancing will be far more effective if imposed early on in an epidemic, when I(t) is naturally smaller.
The researchers emphasize that the trouble with these projections stems from more than just poor data. Compartmental models such as SEIR models are so-called “sloppy models,” which means that when fitting data early in an epidemic, a very large number of combinations of the model’s parameters will fit the data equally well, yet will at the same time yield very different projections. This is an inherent property of these models and have nothing to do with noisy data, although noisy data makes the problem worse.
Overall, Faranda and colleagues conclude that both dynamical and statistical models of any epidemic are strongly sensitive to the noise inherent in reported data. Perhaps most significantly, the simple statistical procedure of extrapolating past number to the long-term future is extremely sensitive to the value of the last data point in the fitted period. Even though standard measures of the accuracy of the fit may seem excellent, this is illusory – even a 20% change in the last data point can lead to an error in the estimate of final number of infections of several orders of magnitude. To avoid unpleasant surprises, they suggest that legitimate projections should employ methods to explicitly estimate the size of expected variations due to data inaccuracies. One might, for example, see how the projections change if one simply excludes the last data point. Or, researchers could add noise to the last data point and so obtain an ensemble of estimates, giving a more realistic view of the uncertainty inherent in the projections.
The paper is available at https://aip.scitation.org/doi/10.1063/5.0008834