The genome coverage of COVID-19 virus from 29 local patients’ samples are all above 98%, with average sequencing depth ranging from 23 to more than 11,400. For 14 imported patients, only nCoVTS0414-6 sample has 90.13% virus genome coverage with an average sequencing depth of 6.07. The rest of the samples have more than 98% genome coverage with average depths from 15 to 13,241.
A total of 6 transmission chains were found among batch 1 COVID-19 samples according to the epidemiological data. Surprisingly, we found COVID-19 virus strains that belong to the same transmission chain were tightly clustered on Bayesian coalescent tree (Figure 1), in spite of sporadic dispersions. Notably, even though COVID-19 virus strains from the same transmission chain were clustered together, we found difficulty distinguishing the introduced case from the other COVID-19 cases within each transmission chain. Collectively, using the heterogeneous data apart from Guangdong Province, we showed the phylogenetic results from genomic data were highly concordant to the transmission chain from epidemiological data.
Phylogenetic relationship of COVID-19 virus strains and the transmission chains from 29 samples. Each dot in the figure represents a patient sample. The onset date of each patient was plotted on the x-axis. The y-axis of these samples were organized based on the position from the Bayesian coalescent tree result. The transmission chain and the generation of the sample in the chain were labelled on the left and right side of each dot, respectively. It was evident that the preliminary transmission chains could be constructed by combining the onset date and the phylogenetic relationship of the COVID-19 virus strains even with limited epidemiological information.
To explore the optimal phylogenetic methods for this study, we also compared the topologies of Bayesian coalescent tree with ML tree and NJ tree (Figure 2). We found the clustering pattern between Bayesian coalescent tree and ML tree were similar to each other and were highly concordant to the transmission chains reconstructed from epidemiological data. However, the clustering pattern of the NJ tree was distinct from the Bayesian coalescent tree and the ML tree, as well as the reconstructed transmission chains. Our results indicated that the NJ tree was incapable of correlating the genomic variations with the transmission chain as accurately as the Bayesian coalescent tree and the ML tree.
The phylogenetic analyses showed distinct structures between maximum likelihood (ML) tree and neighbor-joining (NJ) tree. (A) The phylogenetic tree was constructed using maximum likelihood method. (B) The phylogenetic tree was constructed using neighbor-joining method.
Among positive COVID-19 cases, a considerable proportion were asymptomatic (4), which brought significant challenges to epidemic prevention and control because of the difficulty of getting transmission information from the epidemiological data (5). Mutations accumulated in the virus genome during host-to-host transmission could potentially provide more information for transmission chain construction (6). To verify this hypothesis, we conducted phylogenetic analysis using 14 asymptomatic COVID-19 cases. Interestingly, we found that clustering patterns of 14 asymptomatic cases on the Bayesian coalescent tree were highly correlated with travel history of patients (Figure 3A). Among which, patients with COVID-19 virus strains clustered into clade A and clade B had European travel history, while those who clustered into clade C and clade D had Pacific Rim travel history. Moreover, we found the clustering pattern was largely affected by the G28883C mutation, which is a missense variant that lead to a p.204G>R change on QHD43423.2 (Figure 3A). We further found this mutation had a much higher frequency in European COVID-19 cases compared to other regions (Figure 3B), indicating COVID-19 virus strains carrying this mutation might have originated in Europe. Taken together, our results suggested that the Bayesian coalescent method can be helpful in inferring the transmission relationship of asymptomatic COVID-19 cases.
Phylogenetic analyses on 14 asymptomatic cases revealed high correlation between travel history and viral phylogenies. (A) The distribution of COVID-19 virus variations among 14 asymptomatic COVID-19 cases and the variations underlie phylogenetic structure. The left panel showed the Bayesian coalescent tree, and the right panel showed the variations in each COVID-19 virus strain. (B) The bar plot shows the regional distribution of G28883C mutation.