Comparing COVID-19 Case Prediction Between ARIMA Model and Compartment Model — China, December 2019–April 2020

Bangguo Qi; Nankun Liu; Shicheng Yu; Feng Tan

doi:10.46234/ccdcw2022.239

Introduction

To compare the performance between the compartment model and the autoregressive integrated moving average (ARIMA) model that were applied to the prediction of new infections during the coronavirus disease 2019 (COVID-19) epidemic.

Methods

The compartment model and the ARIMA model were established based on the daily cases of new infection reported in China from December 2, 2019 to April 8, 2020. The goodness of fit of the two models was compared using the coefficient of determination (R²).

Results

The compartment model predicts that the number of new cases without a cordon sanitaire, i.e., a restriction of mobility to prevent spread of disease, will increase exponentially over 10 days starting from January 23, 2020, while the ARIMA model shows a linear increase. The calculated R² values of the two models without cordon sanitaire were 0.990 and 0.981. The prediction results of the ARIMA model after February 2, 2020 have a large deviation. The R² values of complete transmission process fit of the epidemic for the 2 models were 0.964 and 0.933, respectively.

Discussion

The two models fit well at different stages of the epidemic. The predictions of compartment model were more in line with highly contagious transmission characteristics of COVID-19. The accuracy of recent historical data had a large impact on the predictions of the ARIMA model as compared to those of the compartment model.

Introduction

Methods

Results

Discussion

HTML

The outbreak of coronavirus disease 2019 (COVID-19) at the end of 2019 has caused a global pandemic and presents a major challenge to human health and survival. Accurately predicting the incidence of the COVID-19 epidemic can help distribute medicine and other health resources, take prompt and effective control measures, and suppress the spread of the epidemic. The compartment model divides the population into different compartments categorized by their epidemiological status. Ordinary differential equations were used to express the continuous dynamic changes among different compartments. Different epidemic processes of infectious diseases were simulated by adjusting the differential equations. The autoregressive integrated moving average (ARIMA) model is a time series prediction method that uses autocorrelation analysis of time series data to identify patterns of change and predict future points in the series. Previous research studies (1-4) have applied these two models in predicting COVID-19 epidemics, but few have compared them. Therefore, this study aims to compare the performance of the two models during the early COVID-19 outbreak in China. According to the timing of intervention measures and their effects, this paper divides the timeline of the epidemic into 3 stages: 1) Stage 1 from December 2, 2019, when the first case was reported, to January 22, 2020, when few interventions were taken during this stage; 2) Stage 2 from January 23 to February 1, 2022, when cordon sanitaire was implemented during this stage; 3) Stage 3 from February 2 to April 8, 2022, when centralized isolation and expanded testing were applied during this stage (details are provided in Supplementary Materials and Supplementary Figure S1).

DISCUSSION

Appropriate predictions can help authorities promptly adjust control strategies and allocate medical resources. The compartment model and the ARIMA model are used by numerous researchers in the prediction of COVID-19. Taking the early COVID-19 epidemic in China as an example, the predictions of the compartment model and the ARIMA model at different stages of the epidemic were compared and both models fit well at different stages of the epidemic. Furthermore, the predictions of the compartment model are in line with the highly contagious transmission characteristics of the COVID-19. In addition, since the ARIMA model is a prediction method that considers the changing trends of past values over time and predicts future values by fitting the mathematical model with historical data, the accuracy of recent historical data has a relatively large impact on the results of model extrapolation. Based on the numbers of daily new cases and parameters supported by existing literature, the compartment model can be calibrated using Markov chain Monte Carlo (MCMC) algorithm, allowing its predictions to be relatively less affected by outliers.

Although the ARIMA model does not perform as well as the compartment model in terms of predicting COVID-19, it is important to consider that the novel coronavirus is still in the process of dynamic evolution in the future. With this in mind, the parameters of the compartment model can also change accordingly and are difficult to obtain. Meanwhile, the accurate simulation of model has high requirements for the selection of parameters. Compared with the compartment model, the ARIMA model only needs time series data to build a forecasting model, which is easy to implement and has high accuracy for short-term forecasting. It can be quickly applied to forecasting COVID-19.

The compartment model divides the population into different compartments, with the dynamics of these compartments described by ordinary differential equations. Researchers can incorporate different compartments and parameters into the model to more accurately simulate transmission patterns and epidemiological characteristics of the novel coronavirus. Compared with the ARIMA model, which replaces various influencing factors with time, the compartment model can analyze the impact of population movement, vaccination, isolation measures, and other interventions on disease transmission. Therefore, when predicting COVID-19, it is necessary to comprehensively consider the advantages of different models and choose the best model based on existing conditions.

This study was subject to at least two limitations. First, there were no real-world values to compare with the models’ predictions on the temporal trends of the numbers of daily new cases in specific hypothetical scenarios. Therefore, the accuracy of predictions could not be compared using mean absolute error (MAE) and root mean squared error (RMSE). Second, as a result of dynamic changes in epidemic-related influencing factors — such as prevention and control measures, medical resources, and viral transmissibility, etc. — neither the compartment model nor the ARIMA model could guarantee the accuracy of their long-term predictions. It is necessary to constantly update data to improve their prediction accuracy.

Reference (5)

Citation:

[1]	Chintalapudi N, Battineni G, Amenta F. COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day lockdown in Italy: a data driven model approach. J Microbiol Immunol Infect 2020;53(3):396-403. http://dx.doi.org/10.1016/j.jmii.2020.04.004 CrossRef
[2]	Chen SM, Chen QS, Yang JT, Lin L, Li LY, Jiao LR, et al. Curbing the COVID-19 pandemic with facility-based isolation of mild cases: a mathematical modeling study. J Travel Med 2021;28(2):taaa226. http://dx.doi.org/10.1093/jtm/taaa226 CrossRef
[3]	Ceylan Z. Estimation of COVID-19 prevalence in Italy, Spain, and France. Sci Total Environ 2020;729:138817. http://dx.doi.org/10.1016/j.scitotenv.2020.138817 CrossRef
[4]	Hao XJ, Cheng SS, Wu DG, Wu TC, Lin XH, Wang CL. Reconstruction of the full transmission dynamics of COVID-19 in Wuhan. Nature 2020;584(7821):420-4. http://dx.doi.org/10.1038/s41586-020-2554-8 CrossRef
[5]	China NBOS. National data. 2022. https://data.stats.gov.cn/. [2022-07-11]. (In Chinese). https://data.stats.gov.cn/

[1]	Chintalapudi N, Battineni G, Amenta F. COVID-19 virus outbreak forecasting of registered and recovered cases after sixty day lockdown in Italy: a data driven model approach. J Microbiol Immunol Infect 2020;53(3):396-403. http://dx.doi.org/10.1016/j.jmii.2020.04.004.
[2]	Chen SM, Chen QS, Yang JT, Lin L, Li LY, Jiao LR, et al. Curbing the COVID-19 pandemic with facility-based isolation of mild cases: a mathematical modeling study. J Travel Med 2021;28(2):taaa226. http://dx.doi.org/10.1093/jtm/taaa226.
[3]	Ceylan Z. Estimation of COVID-19 prevalence in Italy, Spain, and France. Sci Total Environ 2020;729:138817. http://dx.doi.org/10.1016/j.scitotenv.2020.138817.
[4]	Hao XJ, Cheng SS, Wu DG, Wu TC, Lin XH, Wang CL. Reconstruction of the full transmission dynamics of COVID-19 in Wuhan. Nature 2020;584(7821):420-4. http://dx.doi.org/10.1038/s41586-020-2554-8.
[5]	China NBOS. National data. 2022. https://data.stats.gov.cn/. [2022-07-11]. (In Chinese).

Methods and Applications: Comparing COVID-19 Case Prediction Between ARIMA Model and Compartment Model — China, December 2019–April 2020

Abstract

Author Affiliation

References

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Share

Related