Knowledge Graph: Applications in Tracing the Source of Large-Scale Outbreak — Beijing Municipality, China, 2020–2021

Ying Shen; Yonghong Liu; Xiaokang Jiao; Yuxin Cai; Xiang Xu; Hui Yao; Xiaoli Wang

doi:10.46234/ccdcw2023.017

Article Navigation > China CDC Weekly > 2023, 5(4): 90-95

Methods and Applications: Knowledge Graph: Applications in Tracing the Source of Large-Scale Outbreak — Beijing Municipality, China, 2020–2021

View author affiliations

Abstract
Introduction
Tracing transmission paths and identifying infection sources have been effective in curbing the spread of coronavirus disease 2019 (COVID-19). However, when facing a large-scale outbreak, this is extremely time-consuming and labor-intensive, and resources for infection source tracing become limited. In this study, we aimed to use knowledge graph (KG) technology to automatically infer transmission paths and infection sources.
Methods
We constructed a KG model to automatically extract epidemiological information and contact relationships from case reports. We then used an inference engine to identify transmission paths and infection sources. To test the model’s performance, we used data from two COVID-19 outbreaks in Beijing.
Results
The KG model performed well for both outbreaks. In the first outbreak, 20 infection relationships were identified manually, while 42 relationships were determined using the KG model. In the second outbreak, 32 relationships were identified manually and 31 relationships were determined using the KG model. All discrepancies and omissions were reasonable.
Discussion
The KG model is a promising tool for predicting and controlling future COVID-19 epidemic waves and other infectious disease pandemics. By automatically inferring the source of infection, limited resources can be used efficiently to detect potential risks, allowing for rapid outbreak control.
Funding: Supported by National Key Research and Development Program of China (2021ZD0114102), Science Program of Beijing City (Z221100007922019), and Beijing Natural Science Foundation (7202073)

Author Affiliations

1.
Beijing Office of Global Health, Beijing Center for Disease Prevention and Control, Beijing, China
2.
Yidu Cloud Technology Co Ltd, Beijing, China
3.
School of Public Health, Capital Medical University, Beijing, China

Corresponding author: Xiaoli Wang, wangxiaoli198215@163.com
Online Date: January 27 2023
Issue Date: January 27 2023
doi: 10.46234/ccdcw2023.017

References

[1]	Chen CM, Ross KE, Gavali S, Cowart JE, Wu CH. COVID-19 knowledge graph from semantic integration of biomedical literature and databases. Bioinformatics 2021;37(23):4597 − 8. http://dx.doi.org/10.1093/bioinformatics/btab694 CrossRef
[2]	Domingo-Fernández D, Baksi S, Schultz B, Gadiya Y, Karki R, Raschka T, et al. COVID-19 knowledge graph: a computable, multi-modal, cause-and-effect knowledge model of COVID-19 pathophysiology. Bioinformatics 2021;37(9):1332 − 4. http://dx.doi.org/10.1093/bioinformatics/btaa834 CrossRef
[3]	Hsieh K, Wang YY, Chen LY, Zhao ZM, Savitz S, Jiang XQ, et al. Drug repurposing for COVID-19 using graph neural network and harmonizing multiple evidence. Sci Rep 2021;11(1):23179. http://dx.doi.org/10.1038/s41598-021-02353-5 CrossRef
[4]	Al-Saleem J, Granet R, Ramakrishnan S, Ciancetta NA, Saveson C, Gessner C, et al. Knowledge graph-based approaches to drug repurposing for COVID-19. J Chem Inf Model 2021;61(8):4058 − 67. http://dx.doi.org/10.1021/acs.jcim.1c00642 CrossRef
[5]	Jiang BC, You X, Li K, Li TT, Zhou XJ, Tan LH. Interactive analysis of epidemic situations based on a spatiotemporal information knowledge graph of COVID-19. IEEE Access 2022;10:46782 − 95. http://dx.doi.org/10.1109/ACCESS.2020.3033997 CrossRef
[6]	Wang J, Wang K, Li J, Jiang J, Wang Y, Mei J, et al. Accelerating epidemiological investigation analysis by using NLP and knowledge reasoning: a case study on COVID-19. AMIA Annu Symp Proc 2020;2020:1258-67. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8075493/.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8075493/
[7]	Chen LM, Liu D, Yang JK, Jiang MY, Liu SQ, Wang Y. Construction and application of COVID-19 infectors activity information knowledge graph. Comput Biol Med 2022;148:105908. http://dx.doi.org/10.1016/j.compbiomed.2022.105908 CrossRef
[8]	Hakki S, Zhou J, Jonnerby J, Singanayagam A, Barnett JL, Madon KJ, et al. Onset and window of SARS-CoV-2 infectiousness and temporal correlation with symptom onset: a prospective, longitudinal, community cohort study. Lancet Respir Med 2022;10(11):1061 − 73. http://dx.doi.org/10.1016/S2213-2600(22)00226-0 CrossRef
[9]	Johansson MA, Quandelacy TM, Kada S, Prasad PV, Steele M, Brooks JT, et al. SARS-CoV-2 transmission from people without COVID-19 symptoms. JAMA Netw Open 2021;4(1):e2035057. http://dx.doi.org/10.1001/jamanetworkopen.2020.35057 CrossRef

FIGURE 1. Flow chart of knowledge graph construction.
Note: This figure depicts the process of knowledge graph construction. Epidemiological information and case relationships were first retrieved from unstructured case reports. This information included cases' sociodemographic characteristics, time of exposure, time of onset, time of first positive nucleic acid test, time of diagnosis, and symptoms. Case relations included clear contacts such as sharing the same household, dining together, contacts during medical visits, working or studying in the same room, and traveling in the same vehicle, as well as unclear contacts such as appearing in the same location at the same time. Edge weights were then inferred based on the intensity of contacts and infectiousness. Finally, pruning was conducted according to the edge weights and inferred infection source.

Download: Full-Size Img PowerPoint

FIGURE 2. Transmission paths for Shunyi cluster. (A) A total of 42 relationships were identified in the knowledge graph (KG) model for the Shunyi cluster. (B) A total of 20 relationships were identified by public health professionals for the Shunyi cluster. Sources for Cases 12, 34–36, and 30–40 were unclear and were presented separately.

Note: Red arrows represented additional relationships identified by the KG model; black arrows represented different relationships between the KG model and manual determination; gray arrows represented the same relationships. For illustration purposes, 9 relationships from Case 10 to Case 16–24 were aggregated and presented in a gray square; 8 relationships from Case 12 to Case 34–41 were aggregated and presented in a gray square. The edge weights for Case 13 → Case 32 and Case 14 → Case 32 were the same thus both were kept. Sources for Cases 12, 34–36, and 30–40 were unclear and were presented separately. or illustration purposes, 9 relationships from Case 10 to Case 16–24 were aggregated and presented in a gray square; 8 relationships from Case 12 to Case 34–41 were aggregated and presented in a gray square. The edge weights for Case 13 → Case 32 and Case 14 → Case 32 were the same thus both were kept.

Download: Full-Size Img PowerPoint

FIGURE 3. Transmission paths for Daxing cluster. (A) A total of 31 relationships were identified in the KG model for the Daxing cluster. (B) A total of 32 relationships were identified by public health professionals for the Daxing cluster.

Note: Red arrows indicated additional relationships identified by the KG model; black arrows indicated different relationships between the KG model and manual determination; gray arrows indicated the same relationships; and orange arrows indicated relationships omitted by the KG model. For illustration purposes, 9 relationships from Cases 7 to 21–24 were aggregated, and four relationships from Case 8 to Cases 27–30 were aggregated. The infection source for Case 33 was unclear. For illustration, 4 relationships from Cases 8 to 27–30 were aggregated; 8 relationships from Cases 7 to 19–23 and 25–26 were aggregated; and 4 relationships from Cases 6 to 11–12 and 16–17 were aggregated. The infection source for Case 33 was unclear, while Cases 1 and 2 were both possible sources.
Abbreviation: KG=knowledge graph.

Download: Full-Size Img PowerPoint

Citation:

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Turn off MathJax

Article Contents

Get Citation

PDF

Article Metrics

Article views(4211) PDF downloads(44) Cited by()

Introduction

Tracing transmission paths and identifying infection sources have been effective in curbing the spread of coronavirus disease 2019 (COVID-19). However, when facing a large-scale outbreak, this is extremely time-consuming and labor-intensive, and resources for infection source tracing become limited. In this study, we aimed to use knowledge graph (KG) technology to automatically infer transmission paths and infection sources.

Methods

We constructed a KG model to automatically extract epidemiological information and contact relationships from case reports. We then used an inference engine to identify transmission paths and infection sources. To test the model’s performance, we used data from two COVID-19 outbreaks in Beijing.

Results

The KG model performed well for both outbreaks. In the first outbreak, 20 infection relationships were identified manually, while 42 relationships were determined using the KG model. In the second outbreak, 32 relationships were identified manually and 31 relationships were determined using the KG model. All discrepancies and omissions were reasonable.

Discussion

The KG model is a promising tool for predicting and controlling future COVID-19 epidemic waves and other infectious disease pandemics. By automatically inferring the source of infection, limited resources can be used efficiently to detect potential risks, allowing for rapid outbreak control.

HTML

Knowledge graphs (KGs) have been widely used in the construction of knowledge bases for search engines since their inception by Google. During the coronavirus disease 2019 (COVID-19) pandemic, KGs have played an important role in areas such as the construction of COVID-19-related knowledge bases (1-2), bibliometrics, drug information management, drug repurposing (3-4), auxiliary diagnosis and treatment, and knowledge surveys. However, their application has been limited in exploring infection paths among cases (5–7) and identifying infection sources.

Tracing transmission paths can help to promptly identify the source of infection, detect high-risk areas that may otherwise be overlooked, and facilitate the identification of key populations, important sites with high infection risk, and possible superspreaders, thus allowing for timely actions to cut off the transmission chain and effectively contain the spread of an outbreak. However, in the face of the current COVID-19 pandemic and possible future pandemics with a huge number of infected cases, it is extremely time-consuming and labor-intensive to conduct epidemiologic investigation, identification, and management of close contacts, thus further limiting the resources allocated to tracing transmission paths and identifying infection sources. It is difficult to manually extract key information and trace infection paths among cases from the vast amount of unstructured textual data in case reports. Therefore, the use of information technology is important to quickly extract demographic and epidemiologic information, infer transmission paths and infection sources, identify key populations and key sites of high risk, and prevent further transmission at the community level.

To improve the effectiveness of epidemiological investigation and facilitate tracing of an infection source, we used natural language processing (NLP) and KG technologies to automatically extract structured data from case reports, determine the infection relationships among cases, trace the sources of infection, and construct a directed KG to identify infection sources using parameters including relationship intensity and transmission intensity.

DISCUSSION

The KG model described herein was able to automatically extract data from unstructured text in epidemiologic case reports and sort out complex infection relationships. A directed KG that depicted the identified case relationships and infection sources was successfully constructed through a detailed pruning and reconstruction process. We tested the KG model using two actual COVID-19 outbreaks that occurred in Beijing, China, and the model was proven effective in targeting the infection source.

Using the KG model to deduce transmission pathways, “Case Zero” can be quickly identified, allowing the government to direct limited resources and determine the possible infection source (6). Furthermore, the KG model can be used to identify key transmission sites and key spreaders, which can then inform the detection of populations at higher risk, improve the efficiency of case screening, and help contain the spread of an outbreak in a timely manner. Additionally, a focused investigation could be organized for lonely nodes in the KG (i.e., cases whose transmission paths were not clear) to identify hidden infection sources in a timely fashion. This could help to quickly review the overall epidemic prevention and control direction and address potential issues rapidly, thus avoiding worsening of a current outbreak and preventing future outbreaks. Although the prevention and control strategy for COVID-19 has changed substantially, the KG technology presented in this paper could still enrich the current toolbox of public health countermeasures and offer insights for future epidemics caused by other emerging or existing infectious diseases.

This study has some limitations. First, the KG model is a tool for analyzing infection sources, and its performance is largely affected by the completeness of epidemiological case reports. To be used in future epidemics, essential information from case reports must be clarified in advance. Second, this model was tested in small outbreaks, with good performance; however, the model requires further validation in larger outbreaks.

Conflicts of interest

No conflicts of interest.

Acknowledgments

All health professionals fighting against COVID-19, especially our colleagues in Beijing Center for Disease Prevention and Control.

Reference (9)

Citation:

[1]	Chen CM, Ross KE, Gavali S, Cowart JE, Wu CH. COVID-19 knowledge graph from semantic integration of biomedical literature and databases. Bioinformatics 2021;37(23):4597 − 8. http://dx.doi.org/10.1093/bioinformatics/btab694.
[2]	Domingo-Fernández D, Baksi S, Schultz B, Gadiya Y, Karki R, Raschka T, et al. COVID-19 knowledge graph: a computable, multi-modal, cause-and-effect knowledge model of COVID-19 pathophysiology. Bioinformatics 2021;37(9):1332 − 4. http://dx.doi.org/10.1093/bioinformatics/btaa834.
[3]	Hsieh K, Wang YY, Chen LY, Zhao ZM, Savitz S, Jiang XQ, et al. Drug repurposing for COVID-19 using graph neural network and harmonizing multiple evidence. Sci Rep 2021;11(1):23179. http://dx.doi.org/10.1038/s41598-021-02353-5.
[4]	Al-Saleem J, Granet R, Ramakrishnan S, Ciancetta NA, Saveson C, Gessner C, et al. Knowledge graph-based approaches to drug repurposing for COVID-19. J Chem Inf Model 2021;61(8):4058 − 67. http://dx.doi.org/10.1021/acs.jcim.1c00642.
[5]	Jiang BC, You X, Li K, Li TT, Zhou XJ, Tan LH. Interactive analysis of epidemic situations based on a spatiotemporal information knowledge graph of COVID-19. IEEE Access 2022;10:46782 − 95. http://dx.doi.org/10.1109/ACCESS.2020.3033997.
[6]	Wang J, Wang K, Li J, Jiang J, Wang Y, Mei J, et al. Accelerating epidemiological investigation analysis by using NLP and knowledge reasoning: a case study on COVID-19. AMIA Annu Symp Proc 2020;2020:1258-67. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8075493/.
[7]	Chen LM, Liu D, Yang JK, Jiang MY, Liu SQ, Wang Y. Construction and application of COVID-19 infectors activity information knowledge graph. Comput Biol Med 2022;148:105908. http://dx.doi.org/10.1016/j.compbiomed.2022.105908.
[8]	Hakki S, Zhou J, Jonnerby J, Singanayagam A, Barnett JL, Madon KJ, et al. Onset and window of SARS-CoV-2 infectiousness and temporal correlation with symptom onset: a prospective, longitudinal, community cohort study. Lancet Respir Med 2022;10(11):1061 − 73. http://dx.doi.org/10.1016/S2213-2600(22)00226-0.
[9]	Johansson MA, Quandelacy TM, Kada S, Prasad PV, Steele M, Brooks JT, et al. SARS-CoV-2 transmission from people without COVID-19 symptoms. JAMA Netw Open 2021;4(1):e2035057. http://dx.doi.org/10.1001/jamanetworkopen.2020.35057.

Methods and Applications: Knowledge Graph: Applications in Tracing the Source of Large-Scale Outbreak — Beijing Municipality, China, 2020–2021