Advanced Search

Perspectives: Advancements in Defining and Estimating the Reproduction Number in Infectious Disease Epidemiology

View author affiliations


通讯作者: 陈斌,
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索
Turn off MathJax
Article Contents

Article Metrics

Article views(2934) PDF downloads(15) Cited by()



Advancements in Defining and Estimating the Reproduction Number in Infectious Disease Epidemiology

View author affiliations
  • 1. State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, School of Public Health, Xiamen University, Xiamen City, Fujian Province, China
  • 2. State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, National Innovation Platform for Industry-Education Integration in Vaccine Research, Xiamen University, Xiamen City, Fujian Province, China
  • 3. Chinese Center for Disease Control and Prevention, Beijing, China
  • Corresponding authors:

    Tianmu Chen,

    Yan Niu,

  • Funding: Supported by the National Key R&D Program of China (2021ZD0113903, 2021YFC2301604), Fundamental Research Funds for the Central Universities (20720230001)
  • Online Date: September 15 2023
    Issue Date: September 15 2023
    doi: 10.46234/ccdcw2023.158
  • The reproduction number ($ R $) serves as a fundamental metric in the examination of infectious disease outbreaks, epidemics, and pandemics. Despite an array of available methods for estimating $ R $, both newcomers and established public health professionals often encounter difficulties in comprehending the circumstances for their use and their constrictions. Consequently, this review intends to offer elementary guidance on $ R $’s selection and estimation approaches. To facilitate our review, we executed an extensive search on PubMed and Web of Science applying the following search approach: [“Basic Reproduction Number/classification”(Mesh)] AND [“Basic Reproduction Number/prevention and control”(Mesh)] OR [“Basic Reproduction Number/statistics and numerical data”(Mesh)]. Our search parameters were restricted to articles published from January 2013 to January 2023. This search rendered a total of 7,094 articles, of which we selected 60 that met our inclusion standards for further analysis.

  • $ R $ is a fundamental measure that indicates the average number of infections or cases resulting from contact with an infected individual, thus serving as an important gauge of the transmissibility of infectious diseases. There are three types of $ R $: basic reproduction number ($ {R}_{0} $), effective reproduction number ($ {R}_{eff} $), and real-time or time-varying reproduction number ($ {R}_{t} $). $ {R}_{0} $ is utilized for evaluating the transmissibility of new pathogens or variants when they emerge (1). However, $ {R}_{eff} $ and $ {R}_{t} $ are employed to assess the effectiveness of public health and social measures (PHSMs), providing valuable insights for policymakers and public health officials (Figure 1) (2-4).

    Figure 1. 

    Comparison of application scenarios for various reproduction number methods.

    Abbreviation: DBM=definition-based method; NGM=next-generation method; FSE=final-size equation; GIBM=generation interval-based method.

    $ {R}_{0} $, also known as the basic reproduction number, signifies the mean number of secondary infections attributed solely to a single infected individual within a susceptible population (5-7). It proves instrumental in predicting the probability and magnitude of disease outbreaks, plus the vaccination threshold required to establish herd immunity (1,8). Various factors like the frequency of contact among the population, sanitary practices, and seasonal changes may alter $ {R}_{0} $ further (9). Altering the transmission rate ($ \beta $), the recovery rate ($ \gamma $ or inverse of the mean infection period), or the contact rate substantially influences the estimated value of $ {R}_{0} $ (10). It is essential to account for any pre-existing immunity within the given population while calculating $ {R}_{0} $. Presently, there exists no standardized method for determining and reporting $ {R}_{0} $, addressing the issue of its variability (11).

    The concept of $ {R}_{eff} $ is similar to $ {R}_{0} $ and often confused by researchers. The major distinction lies in the fact that $ {R}_{eff} $ is suitable for establishing a baseline for PHSMs or exposed populations, representing the actual immunity of the population (12). As a result, $ {R}_{eff} $ is usually smaller than $ {R}_{0} $, because it primarily relies on not only the transmissibility of pathogens but also the levels of immunization within the population (13).

    Anne Cori et al. (14) provided a more detailed breakdown of $ {R}_{t} $, dividing it into the case reproduction number ($ {R}_{c} $) and the instantaneous reproduction number ($ {R}_{i} $). $ {R}_{c} $ represents the average number within $ {R}_{t} $ and reflects the transmissibility at a given time point. On the other hand, $ {R}_{i} $ represents the average number within $ {R}_{t} $ calculated under the assumption of no change after a specific time point, making it easier to estimate in real-time (15). $ {R}_{t} $ estimates the spread of pathogens by monitoring and tracking data that evolves over time during the course of an outbreak (16). $ {R}_{t} $ is also an important parameter for describing the epidemiological characteristics of a disease and evaluating the effectiveness of PHSMs (Figure 1) (17-18). The values of $ {R}_{t} $ vary due to factors such as changes in immunity and interventions across different populations, including interventions that impact personal contact networks (19-20). In practice, researchers must choose whether the main $ {R}_{t} $ index to be obtained is $ {R}_{c} $ or $ {R}_{i} $, and then select the appropriate modeling methods accordingly. Overall, both $ {R}_{c} $ and $ {R}_{i} $ represent the average number of individuals who are at risk of infection at a specific time (t), with $ {R}_{c} $ focusing on the attributes of infected individuals at the time t and being more widely used, while $ {R}_{i} $ emphasizes the temporal attribute at time t if the situation remains unchanged. Consequently, if the disease transmissibility declines at a particular point, $ {R}_{i} $ will transition from high to low, while $ {R}_{c} $ will smoothly decrease (21).

  • The direct method is used to estimate $ R $ by analyzing a clear transmission chain multiplying the $ \beta $ with the transmission probability per contact ($ p $), contact rate ($ c $), and infectious period ($ D $) (11,22):

    $$ \beta =pc $$
    $$ {R}_{0}=\beta D=pcD $$

    The direct method is applicable to distinct scenarios that involve a minimal number of case generations within a brief time frame, or small sample sizes during the early phase of an epidemic or outbreak. This allows researchers the potential to separately calculate $ R $ for each possible transmission chain, analyze the distribution of $ R $, and evaluate the contributions of different transmission chains to the spread of the disease. However, the direct method might be prone to bias resulting from small sample sizes and is subject to limitations related to the lack of time variation. Moreover, challenges regarding underreporting and fragmented data in real-time evaluations present potential issues (23).

  • The definition-based method (DBM) is an indirect approach used to estimate the $ R $ value. This method is applied to various transmission dynamics models, including the Susceptible-Infectious-Recovered (SIR) model, the Susceptible-Exposed-Infectious-Recovered (SEIR) model, the Susceptible-Infectious-Recovered-Cross immune (SIRC) model, and the Susceptible-Infectious-Recovered-Susceptible (SEIS) model (24-28). Taking the SIR model as an example:

    $$ \frac{dS}{dt}={b}_{r}N-\frac{\beta SI}{N}-{d}_{r}S $$
    $$ \frac{dI}{dt}=\frac{\beta SI}{N}-\gamma I-{d}_{r}I $$
    $$ \frac{dR}{dt}=\gamma I-{d}_{r}R $$

    The secondary infections generated by an infected individual per unit of time are represented as $ \beta S/ N $, which corresponds to the inflow process. On the other hand, the recovery or natural death of an infected individual per unit of time is denoted as $ \gamma +{d}_{r} $, which corresponds to the outflow process. Thus, we can calculate $ {R}_{eff} $ as follows:

    $$ {R}_{eff}=\frac{Inflow\;process}{Outflow\;process}=\frac{\beta S}{N}\times \frac{1}{\gamma +{d}_{r}}=\frac{\beta S}{\left(\gamma +{d}_{r}\right)N} $$

    $ {R}_{0} $ refers to the $ R $ when nearly the entire population is susceptible, which means S is approximately equal to N:

    $$ {R}_{0}=\frac{\beta }{{d}_{r}+\gamma } $$

    The DBM calculates $ R $ by expressing it as a function of model parameters. This approach proves valuable in the advanced stages of an epidemic as it yields results with significant explanatory power. However, its applicability is limited to single-host and single-kinetic models, thus restricting its use in multi-host or co-kinetic models. The DBM incorporates both the disease’s natural history and demographic parameters, rendering it meaningful for predicting and preventing outbreaks. Moreover, it is renowned for its simplicity, ease of comprehension, and minimal hardware or software requirements.

  • The next-generation method (NGM) serves as a prevalent approach for the estimation of $ R $. This method utilizes the maximum eigenvalue of the next-generation matrix within a dynamic model following the method proposed by Van den Driesschie and Watmough (29-33). NGM is frequently applied across a range of dynamic models including, but not limited to, the SIR and SEIS models (25). Furthermore, it delivers quantitative accounts of secondary infections and can estimate the percentage of undetected cases across diverse outbreak scenarios (29,34). Compartments within these dynamic models are differentiated based on their infectivity. The ‘x-group’ signifies compartments possessing infectivity, whereas the ‘y-group’ denotes compartments devoid of infectivity. The equations corresponding to these groups are presented below:

    $$ \frac{{dx}_{i}}{dt}={{F}}_{i}\left(x,y\right)-{{V}}_{i}\left(x,y\right)i=1,\dots ,n $$
    $$ \frac{{dy}_{j}}{dt}={{G}}_{j}\left(x,y\right)j=1,\dots ,m $$

    $ {{F}}_{i} $ represents the newly infected individuals in compartment i, $ {{V}}_{i} $ represents individuals who transit to other compartments. To illustrate NGM, we will continue using the SIR model as an example. In the SIR model, where n and m are 1 and 2, respectively, and with x = I and y = (S, R), the corresponding equations are as follows:

    $$ {{F}}_{1}=\frac{\beta SI}{N} $$
    $$ {{V}}_{1}=\gamma I+{d}_{r}I $$
    $$ {{G}}_{1}={b}_{r}N-\frac{\beta SI}{N}-{d}_{r}S $$
    $$ {{G}}_{2}=\gamma I-{d}_{r}R $$

    Taking derivatives of F and V to I, one obtains the Jacobi matrix: $ F=\beta S/ N$, and $ V=\gamma +{d}_{r} $. And $ {R}_{eff} $ is the real part of the leading eigenvalue of the next-generation matrix ($ F{V}^{-1} $) 25:

    $$ {R}_{eff}=\rho \left(F{V}^{-1}\right)=\frac{F}{V}=\frac{\frac{\beta S}{N}}{\gamma +{d}_{r}}=\frac{\beta S}{\left(\gamma +{d}_{r}\right)N} $$
    $$ {R}_{0}=\frac{\beta }{\gamma +{d}_{r}} $$

    Nevertheless, the application of the NGM method to multi-group or multi-host compartmental models exhibits certain limitations. This method exclusively ascertains the stability threshold of a disease-free equilibrium, displaying a deficiency in explicit explanatory power. Employing smaller data sets during the initial phases of an epidemic may result in the omission of pivotal information. Over time, there has been a noted enhancement in the quality and dependability of the NGM results. Hence, researchers must modify their methodologies based on specific scenarios. For instance, when studying diseases such as hand, foot, and mouth disease, it might be plausible to exclude certain factors like the short disease duration, mobility of patients, and spatial structure.

  • The final-size equation (FSE) is a valuable tool for comprehending the relationship between the outcome of an epidemic and $ {R}_{0} $, while taking into account the proportions of susceptible and recovered individuals. In the SIR model, the calculation formula is as follows:

    $$ {R}_{0}=\frac{ln\dfrac{{S}_{0}}{{S}_{\mathrm{\infty }}}}{1-{S}_{\mathrm{\infty }}} $$

    Where $ {S}_{0} $ and $ {S}_{\infty } $ represent the initial and final proportions of susceptible individuals.

    FSE is often employed in the SIR model to ascertain the ultimate scale of an epidemic (35). With its precise data output and straightforward equation form, it is well suited to facilitate initial estimates following the conclusion of an epidemic. Nonetheless, its use is model-specific and necessitates fresh derivation for application for other models, which can prove challenging for complex dynamic models.

    It has been definitively established that the FSE possesses a unique solution in three mean field models, namely homogeneous, pairwise, and heterogeneous. Moreover, linearizing the FSE facilitates the transformation of optimal vaccination issues into simpler knapsack problems, yielding practical insights for decision-makers and the general public when considering vaccination strategies (36-37). However, a gap exists with respect to the availability of an R package incorporating displacement or interaction for the calculation of Rt using the FSE approach (38).

  • The method based on generation interval is frequently utilized to estimate $ {R}_{t} $ in the field of epidemiology. This approach leverages the concept of the generation gap, defined as the duration between the infection of a primary case and the consequent infection of secondary cases. This method streamlines the natural history of the illness by concentrating on the distribution of time intervals among generations. Within this framework, two key indicators are emphasized: the generation interval (GT) and the serial interval (SI). GT signifies the duration between infection incidents in an infector-infected pair, whereas SI symbolizes the time from symptom onset in these pairs (39). Accurate estimation of GT becomes demanding as it is dependent on an exhaustive investigation of contact history (40). In comparison, SI’s determination is less challenging as symptoms can be readily detected during field epidemiological surveys (41). By quantifying the relationship between generations using SI, researchers can estimate $ {R}_{t} $, $ {R}_{eff} $, and $ {R}_{0} $ (42-44).

    Several R (version 4.3.0, R Core Team, Vienna Austria) packages, namely EpiEstim, EpiNow2, and R0, currently facilitate the computation of regeneration numbers based on GT or SI (15,45-46), thereby significantly lowering the barrier to their utilization. We have developed an interactive application for users unfamiliar with the R language, particularly grassroots disease control staff. This application, called Reproduction Number Calculator, enables access to these R packages without necessitating knowledge of programming (available at However, it is crucial to acknowledge the method’s inherent limitations. Inaccuracies may arise if the assumed distribution of intergenerational times does not accurately reflect the dynamics of the disease (42). This uncertainty in distribution can potentially result in an underestimation of R’s uncertainty (15). Oversights related to group immunity and infection staging can create bias when estimating $ {R}_{eff} $ (42). Further, the generation interval-based method comes with specific demands and limitations, such as a need for clear transmission chains, comprehensive and timely data, and an accurate intergenerational time distribution assumption. These factors may limit its utility in certain scenarios.

    In conclusion, the generation interval-based method provides valuable insights into disease transmission dynamics and facilitates the estimation of $ {R}_{t} $, Reff and $ {R}_{0} $. However, researchers should exercise caution in interpreting the results and consider the assumptions and data requirements associated with the method.

  • Choosing the correct approach to R estimation is critical in epidemiological research. Each model introduces its own unique strengths and weaknesses. The desired R, dictated by disease characteristics and accessible data, must be thoughtfully considered by researchers to identify the most fitting calculation method. This systematic strategy guarantees that the estimation procedure corresponds with the existing conditions and provides trustworthy outcomes.

  • No conflicts of interest.

Reference (46)




    DownLoad:  Full-Size Img  PowerPoint