LML Discussions: lessons to learn about modelling from the COVID experience

Scientists regularly use models, which help make their thinking definite, specific, and easier to challenge, test and refine. Scientists know that models always have shortcomings. There’s no such thing as “modelling the real thing,” only various approximations thereof, which may be more or less useful.

As a result, the COVID era was a shocking and frustrating experience for many scientists, seeing models used and abused, often employed to back specious arguments and make political points. Epidemiological models were no doubt useful, and helped authorities take more informed decisions. They were also misused to spread doubt and disinformation.

In the aftermath of this crisis, following a suggestion from Ole Peters, I thought it might be useful to get the views of LML colleagues – most involved in building and using mathematical models of one sort or another – what they thought were the lessons to take away about modelling from the COVID experience. Below is a loose discussion, based on the views of several LML Fellows, which may be of interest to the public and to other scientists involved in modelling. Any comments, thoughts or reactions would be much appreciated!

Mark Buchanan


Mark Buchanan: One of the most interesting and disturbing things about the COVID era was the dual response of the public to scientists’ models. Some people were looking to scientists for guidance, while others were attacking them, acting almost as if the scientists had caused the epidemic. I’m still shocked and puzzled by how the public sees scientists – and modelling.

Erica Thompson: I think we learned a lot about the public appetite for scientific information, both the good and bad. There were loads of armchair epidemiologists genuinely keen to use and understand the available data and constructively criticize the way things were done. And there were loads of armchair epidemiologists who were misunderstanding or misinterpreting the data and then spreading that misinformation.

Hopefully everyone has learned that modelling is inherently political. Models inevitably contain value judgements by dint of what is included and what is not, as well as by the choice of representation. Models also shape the way we think about interventions and consider pros and cons.

Mark Buchanan: This is certainly true. Personally, I was surprised by the urgency of some to draw sweeping conclusions from models that were in no way capable of supporting them. One example: early on people in the U.S. opposed to public health measures often invoked the predictions from the Institute for Health Metrics and Evaluation (IHME; https://go.nature.com/3bPm35o), which forecast for all US states when it would be safe to reopen. These were based on predictions for when the death rate would drop below 1 per million. Surprisingly — and suspiciously — these dates all fell only a month or so in the future, for every single state, leading to reassuring downward sloping curves showing the epidemic ending quite soon. No need for any aggressive actions. Too good to be true? Yes.

As it turns out, these rosy projections only reflected a choice in the modelling approach. The IHME model (back then at least, May 2020) didn’t actually simulate the dynamics of the epidemic spreading, but fit a curve to the recent disease data, and then demanded that the best fit be approximately Gaussian, with the up and down of the death rate being (roughly) a bell curve. This assumption alone guaranteed the near-future disappearance of the epidemic. It would do so even if the recent data showed nothing but exponential growth.

Combined with comforting curves, this kind of modelling risks creating mis-perceptions. To be clear, the modellers themselves were open about this – it was others who misinterpreted the results.

Rainer Klages: On a slightly different point, It seems to me that to some extent the general public, perhaps also politicians, had false expectations about how science works. Apparently it was expected that research can immediately deliver unambiguous results that just need to be implemented somehow. The public was surprised to learn that there are controversies between different scientists concerning specific findings.

To scientists this is natural, as this is how science works: there are results, but they are critically investigated by the scientific community, sometimes proven wrong, sometimes confirmed. My impression was that the public seemed to some extent unaware of how the process of scientific research works, that it involves debates, and back and forth thinking. Based on this some people seemed to question the integrity of science to deliver credible results altogether.

Mark Buchanan: This touches on another issue which is the unavoidable uncertainty that is always part of science. As you say, some people apparently see science as something that is certain, when it is actually anything but.

Erica Thompson: As scientists, I think we definitely learned some major lessons about uncertainty and humility, and the degree to which models can be (simultaneously) extremely useful to answer some questions and extremely poor at answering others.

Colm Connaughton: I think part of this is inherent to epidemiological modelling, which tends to be data-poor and facing a very non-stationary environment. The pandemic made it very clear that even in countries that were able to build strong monitoring and surveillance infrastructure, the data required to constrain model predictions and meaningfully reduce medium-term uncertainty was rarely available.

There are two fundamental problems with data. Firstly, we observe only a single trajectory of what is an intrinsically non-stationary process. This really limits what you can do with purely statistical methods, such as machine learning, since you can’t meaningfully average over realisations or over time to understand fluctuations. There is then no real alternative to introducing models. Secondly, once you introduce models, you need data to parameterise them and most of the data needed relates either to properties of the disease such as age-profile of transmission or mortality rates, which were unknown, or to socio-economic factors such as how people respond to interventions (over- or under- reporting). Again, incredibly difficult to measure.

Rainer Klages: I was also partially surprised how little was known about even very basic facts as far as pandemic issues were concerned. For example, in Germany scientists first supported the conclusion that masks would not be helpful to avoid the spreading of the virus. It took quite some time before the use of FFP2 masks was generally accepted, and then recommended.

Mark Buchanan: Another issue I think is interesting is the value of model complexity. Early on in the UK the modelling group at University College London (UCL) made some projections for the likely number of deaths in the absence of public health measures. The estimates were sobering – hundreds of thousands – and these were then subject to lots of criticism, especially from parties who were against any serious steps to combat viral spread. Whatever the problems of the UCL model, however, there was actually no need then for a complex epidemiological model to make a crude projection of the likely number of deaths. That could be done on the back of an envelope, just taking the UK population, making a rough estimate of how many people might be infected (20%, 30%, 50%), and using the available data on the case fatality rate. This immediately gives numbers in the hundreds of thousands, and would suggest serious risk of hospitals being overburdened.

Ole Peters: I’m not sure if this is something we have actually learned but I’m hopeful. Quite some time before the pandemic I started worrying about the trend away from mechanistic thinking in science and towards purely data-driven science.

In economics, psychology, and behavioral science this really hasn’t been good, and it’s behind the replication crisis. No matter how we judge statistical significance, it seems, we will always find spurious patterns in data that don’t reproduce, and the less intuitive these patterns are, the more sensational they become. That leads to the publication of lots of “big results” which actually are just bad experimental design, bad data analysis, or bad luck.

In the pandemic I noticed that this approach to science and modeling was also present in public health. Of course, we have the very simple SIR-type models, but we also have extremely complex models, implemented in thousands of lines of code, with little subroutines for anything an academic supervisor could think of for a graduate student to add to the model. These models are essentially black boxes: you feed data in, and you get a forecast out, but it’s impossible to trace what the forecast actually means.

At least in the early days of the pandemic, such models produced results that were incompatible with back-of-the-envelope analyses of what was going on. The back-of-the-envelope stuff did better in the end.

Specifically, I remember that Davide Faranda started fitting an SIR model to early data of deaths and numbers of infections, just to see what it said. I asked him to try an experiment: forecast the total size of the pandemic with data from day 0 up to day 10, then run the same forecast with data from day zero up to day 11, then up to 12 and so on, and check how the forecast total size changes. As expected, fluctuations in the initial data were amplified exponentially as the forecast extended out, and numbers fluctuated hugely. I don’t remember the exact numbers, but both 500,000 deaths and 5,000 deaths would have been a possible conclusion depending on how much data was included.

Because of the exponential nature of the infectious process, it wasn’t immediately clear that more data led to more stable forecasts. This little experiment, in the end, led to a paper by LML on the pandemic. But for me the bigger message was: yes, it’s really dangerous to make models so complex that we don’t know any more what is going on mechanistically.

Colm Connaughton: There were other related problems. Sometimes complex modelling ended up obscuring rather than clarifying the important questions. Modellers were continually asked by policy-makers to answer questions which probably could not be answered by the models. For example, it is not possible to investigate the effect of school closures using a model that doesn’t include schools. As a result, models were made more complex to include various features that policy-makers were interested in. However we all know that increased model complexity in the absence of a corresponding increase in data to constrain this complexity doesn’t usually improve predictions and sometimes leads to worse predictions if over-fitting is not managed properly. I don’t know what the solution to this is – policy makers will always want to know the effect of specific policies – but I feel that the question of managing model complexity is mostly “swept under the rug.”

Mark Buchanan: I suppose one important question coming out of all of this is: what can we do to be better prepared next time, not only to respond to a new pandemic, but to make the modelling more useful? One thing that emerged clearly from the entire experience was a rather profound gap between the interests of politicians and the interests of scientists.

Ole Peters: Perhaps one lesson for the future is: stay close to the data. Early on, in March 2020, the published UK government advice stated a doubling time for Covid infections that was clearly off by a factor of two. How did this happen? I dug into it a bit back then, and here is what I found.

From what I could tell, the government advisors had tried to find the doubling time by estimating two quantities: the basic reproduction number (R0) and the serial interval (SI). The doubling time can be found from the ratio R0/SI and some trivial arithmetic. The problem is that both R0 and SI are very difficult to estimate in practice, especially based on the kind of data that was available in March 2020.

Alternatively, one can read off the doubling time directly from the time series of the number of infections. Just check when that number is, say 1000, and then look how long it takes for it to reach 2000. The interval is the doubling time.

Of course there are questions where we want to estimate R0 and SI separately, but this was not such a question.

People, including myself, who took the second route, close to the data, all came to the conclusion that the doubling time early on was around 2.5 days. The UK government papers showed around 5 days. Given the multiplicative exponential dynamics of the pandemic, this was a crucial difference, and it was a case of “stay close to the data”: don’t estimate parameters and combine them to find what you’re looking for if you can just estimate directly what you’re looking for.

Colm Connaughton: In the UK, despite the fact that the pandemic was a predictable event, governments and scientists were completely unprepared. The initial mechanisms for communicating scientific advice to governments were ad hoc and existing models did not seem set up to cope with the huge uncertainties about the initial stages of the outbreak. The ICL group attracted much criticism for the standard of the code that was used for their initial modelling studies after it was eventually made public. Such criticism implies that better standards were the norm elsewhere but I don’t think this is the case.

Most epidemiological models are academic models and as such are entirely unsuited to real-world use. The reason for this is that funding agencies and governments have never made resources available to maintain, validate and harden models to make them suitable for real world use. To compound this failing, in the UK at least, the modelling done by groups like SAGE, SPI-M and independent SAGE using these models was amateur in the sense that most of the people involved were expected by the government to do this work in their spare evenings and weekends while holding down full time administrative and teaching jobs in universities. 

My personal view is that this lack of preparedness and resources simply indicates that governments don’t think modelling is very important.

Davide Faranda: In my view, before the Covid-19 pandemic, epidemiology was not concerned with real-time forecasting. There were studies of past epidemic trends or, at most, the need to predict epidemic trends for the following year’s flu.

Early on, we realized that the model that assumes these things as constants could not be used for reliable predictions of epidemiological dynamics. However, if you take all these factors into account and introduce some variability into the dynamics, you can get a much more realistic forecast. The fluctuations in the virus dynamics induced by these factors return a different time for the pandemic to spread than that calculated in the deterministic dynamics. Calculating this time adds an additional difficulty to predicting when the epidemic peak will occur.

By introducing noise and uncertainty, a range of possible scenarios can be defined, instead of making a single prediction. This is what happens when one replaces the deterministic approach, which returns a single, average result, with a probabilistic approach. The old models were not wrong but gave partial answers, that is, only one possible solution while ignoring all other.

Erica Thompson: In the UK, it was also disappointing to hear the phrase “Follow the Science” trotted out again and again to obscure the political content of decisions and attempt to drop accountability onto scientists rather than decision makers.

Leave a Reply

Your email address will not be published. Required fields are marked *