Journal Information
Vol. 56. Issue 9.
Pages 601-602 (September 2020)
Vol. 56. Issue 9.
Pages 601-602 (September 2020)
Scientific Letter
Full text access
Statistical and mathematical modeling in the coronavirus epidemic: some considerations to minimize biases in the results
Modelado estadístico y matemático en la epidemia del coronavirus: algunas consideraciones para minimizar los sesgos en los resultados
Marcos Matabuenaa,
Corresponding author

Corresponding author.
, Oscar Hernan Madrid Padillab, Francisco-Javier Gonzalez-Barcalac,d,e,f
a Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
b Department of Statistics, University of California, Los Angeles, United States
c Department of Medicine, Universidade de Santiago de Compostela, Santiago de Compostela, A Coruña, Spain
d Centro de Investigación Biomédica en Red de Enfermedades Respiratorias (CIBERES), Madrid, Spain
e Department of Respiratory Medicine, University Hospital of Santiago de Compostela (CHUS), Santiago de Compostela, A Coruña, Spain
f Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, A Coruña, Spain
This item has received
Article information
Full Text
Download PDF
Full Text
To the Editor

The new coronavirus (SARS-CoV-2)1,2 has demonstrated the heavy health and socioeconomic impact that an epidemic can have worldwide. In the face of such pandemics, governments and health authorities must act quickly3 and implement policies that aim to limit the transmission of the virus, avoid the collapse of the health system, and reduce the morbidity and mortality associated with the virus - strategies all driven by the need to prioritize resources in settings where they are scarce. In this respect, supporting decision-making with the use of mathematical models can be a key factor. These tools are potentially useful for explaining and predicting the speed and manner in which the virus spreads, in order to support health planning, identify and stratify patient risk, and establish prognosis from electronic records.

A crucial consideration in the area of mathematical modeling is that the data collected are usually observational in nature. This may lead to significant bias in the results obtained from the systematic application of conventional statistical techniques.4 Another important factor is incomplete information,5 such as censored and lost data. As no diagnostic tests are performed in many cases, it is impossible to know whether or not they are infected. In addition, endpoints such as recovery or death have not yet been reached during the course of the study. Moreover, patients with no symptoms or mild symptoms are the least likely to visit a doctor or even have a diagnostic test. Again, ignoring the effects of missing or censored data may confer significant bias on the conclusions reached.5

From a statistical point of view, the study design may be more important than the amount of data collected. However, in a health emergency, governments may be overwhelmed and data may be collected from severe cases only. To determine the actual extent of the pandemic, random population sampling is necessary. A clear exception to this SARS-CoV-2 crisis is the case of South Korea and Singapore, where population tests were conducted systematically, allowing outbreaks of infection to be isolated more quickly, to the extent that the effects of the virus were mitigated more quickly than in other countries.

From an epidemiological point of view, it is important to highlight the need to identify variables that indicate patient risk and prognosis. The most popular indicator is undoubtedly the mortality risk, which measures the likelihood that a patient will die if he or she has the disease. Precise estimations are not simple, and as indicated above, given the observational nature of the recorded data, the presence of biases is customary. According to Lipsitch et al.,6 biases occur because of a delay in recording information or because there is a preponderance of patients at higher risk in the database. A potential solution to this problem in the analyses is to stratify patients into different groups based on their severity and prognosis. The use of specific techniques to manage causal inference or missing data, such as the Propensity Score or doubly robust estimators, is also recommended.7 This approach can improve statistical inference drawn from patients belonging to each stratum.

The large discrepancies in the proportion of symptomatic patients and the mortality risk associated with SARS-CoV-2 underline the need to adopt these approaches. On March 5, 2020, the percentage of asymptomatic patients reported by the European Center for Disease Prevention and Control was 80%. However, in a study of patients from the Diamond Princess cruise ship, this figure was 20%.9 In the latter case, the study sample comprised a greater proportion of older patients with a higher probability of presenting symptoms, making it difficult to extrapolate the conclusions to the general population. Similarly, the fatality rate varies significantly (estimates range between 0.4% and 15%10), partially due to the problems mentioned. The precise characterization of these variables based on the epidemiological profiles of the population is essential to understand the transmission mechanisms of the virus11 and predict future care demands.

A basic criticism of epidemic modelling is that parameters are frequently adjusted according to government-provided statistics on infected subjects, despite the fact that very few countries can provide clear evidence that these figures reflect the real situation, given the lack of knowledge about the percentage of asymptomatic patients and lack of overall testing among the population. In fact, asymptomatic patients may be the main transmitters of the virus.11

Mathematical models can be an important tool for anticipating future developments and supporting decision-making. However, if data are inaccurate and specific techniques to correct the observational nature of the recorded data are not used, conclusions may be biased. In this regard, all relevant institutions should make an effort and openly provide high-quality data,12 so that scientists can find the solutions most beneficial to society. Simultaneously, in the current era of big data,13 collaboration between different stakeholders (health management, care, research, etc.) is essential. The use of big data would facilitate the construction of more complex models that can take advantage of all data recorded from individual patient monitoring14 and in this way provide more agile responses to current epidemics.15


This work has received financial support from the Consellería de Cultura, Educación e Ordenación Universitaria (accreditation 2019–2022 ED431G-2019/04) and the European Regional Development Fund (ERDF), which acknowledges the CiTIUS-Research Center in Intelligent Technologies of the University of Santiago de Compostela as a Research Center of the Galician University System.

Conflict of interests

The first 2 authors state that they have no conflict of interests.

Francisco-Javier Gonzalez-Barcala has received honoraria for consultancy, projects or presentations from Chiesi, Menarini, Rovi, Bial, GlaxoSmithKline, Laboratorios Esteve, Teva, Gebro Pharma, ALK, Roxall, Stallergenes-Greer, Boehringer Ingelheim, Mundipharma and Novartis.



C. Huang, Y. Wang, X. Li, L. Ren, J. Zhao, Y. Hu, et al.
Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China.
Lancet, 395 (2020),
N. Zhu, D. Zhang, W. Wang, X. Li, B. Yang, J. Song, et al.
A novel coronavirus from patients with pneumonia in China, 2019.
N Engl J Med, 382 (2020), pp. 727-733
I. Kickbusch, G. Leung.
Response to the emerging novel coronavirus outbreak.
S. Greenland.
Multiple-bias modelling for analysis of observational data.
J R Stat Soc Ser A Stat Soc, 168 (2005), pp. 267-306
A. Tsiatis.
Semiparametric theory and missing data.
Springer Science & Business Media, (2007),
M. Lipsitch, C. Donnelly, C. Fraser, I. Blake, A. Cori, I. Dorigatti, et al.
Potential biases in estimating absolute and relative case-fatality risks during outbreaks.
H. Bang, J. Robins.
Doubly robust estimation in missing data and causal inference models.
Biometrics, 61 (2006), pp. 962-973
R.M. Anderson, H. Heesterbeek, D. Klinkenberg, T.D. Hollingsworth.
How will country-based mitigation measures influence the course of the covid-19 epidemic?.
K. Mizumoto, K. Kagaya, A. Zarebski, G. Chowell.
Estimating the asymptomatic proportion of coronavirus disease 2019 (covid-19) cases on board the diamond princess cruise ship, yokohama, japan, 2020.
D.D. Rajgor, M.H. Lee, S. Archuleta, N. Bagdasarian, S.C. Quek.
The many estimates of the covid-19 case fatality rate.
Lancet Infect Dis, (2020),
Y. Bai, L. Yao, T. Wei, F. Tian, D.-Y. Jin, L. Chen, et al.
Presumed asymptomatic carrier transmission of COVID-19.
S.P. Layne, J.M. Hyman, D.M. Morens, J.K. Taubenberger.
New coronavirus outbreak: framing questions for pandemic prevention.
Sci Transl Med, 12 (2020),
N.G. Reich, L.C. Brooks, S.J. Fox, S. Kandula, C.J. McGowan, E. Moore, et al.
A collaborative multiyear, multimodel assessment of seasonal influenza forecasting in the united states.
Proc Natl Acad Sci U S A, 116 (2019), pp. 3146-3154
X. Li, J. Dunn, D. Salins, G. Zhou, W. Zhou, S.M. Schüssler-Fiorenza Rose, et al.
Digital health: tracking physiomes and activity using wearable biosensors reveals useful health-related information.
PLoS Biol, 15 (2017), pp. e2001402
C. Viboud, A. Vespignani.
The future of influenza forecasts.
Proc Natl Acad Sci U S A, 116 (2019), pp. 2802-2804

Please cite this article as: Matabuena M, Padilla OHM, Gonzalez-Barcala FJ. Modelado estadístico y matemático en la epidemia del coronavirus: algunas consideraciones para minimizar los sesgos en los resultados. Arch Bronconeumol. 2020;56:601–602.

Copyright © 2020. SEPAR
Archivos de Bronconeumología
Article options

Are you a health professional able to prescribe or dispense drugs?