-
In December 2019, a novel coronavirus from patients with pneumonia was identified and subsequently named the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (1-3). SARS-CoV-2 has caused a coronavirus disease 2019 (COVID-19) pandemic with high morbidity and mortality. Analyzing the genome of SARS-CoV-2 (also referred to as COVID-19 virus) from clinical samples is crucial for the understanding of viral spread and viral evolution as well as for vaccine development (4-6). Presently, whole genome sequencing of the COVID-19 virus was often generated by next generation sequencing (NGS) (7). Although NGS methods have many advantages in terms of speed and parallelism, the accuracy and read length of Sanger sequencing is still superior and has confined the use of NGS mainly to resequencing genomes (8).
Here we introduce a detailed method to rapidly obtain COVID-19 virus whole-genome sequence from clinical samples. This method is based on multiple nucleic acid amplified fragments for Sanger sequencing. We applied this method to obtain 2 complete genome sequences of COVID-19 virus from clinical samples of patients with COVID-19.
-
In this study, bronco-alveolar lavage samples were collected from patients with COVID-19 in Hubei, China. COVID-19 virus RNA was identified as positive (Ct value: 28.78 and 31.86) by a real-time fluorescence-based reverse transcriptase polymerase chain reaction (rRT-PCR) assay as previously reported (7).
-
Viral RNA was extracted from 140 μL of sample using QIAamp Viral mini kits (Qiagen, Germany) according to the manufacturer’s instructions. RNA was eluted in 80 μL of elution buffer. A total of 38 sets of specific primers covering the whole COVID-19 virus genome were designed (Table 1) according to the reference sequence (WH19004, Accession ID: EPI_ISL_402120) obtained by NGS as previously reported (7). Overlapping fragments were obtained by RT-PCR conducted as follow: 5 μL of extracted RNA were amplified with the QIAGEN OneStep RT-PCR Kit (Qiagen, Germany) and RT-PCR programs were run as follows: 50 ℃ 30 min; 95 ℃ for 15 min; 95 ℃ for 30 s, 50/55 ℃ 30 s, 72 ℃1/2 min, 40 cycles; 72 ℃ 5 min. All PCR products were confirmed by gel electrophoresis analysis and sequenced using the Sanger method.
Set Name Start* End* Primer sequence (5’→3’) 1 1F 64 86 CTCTAAACGAACTTTAAAATCTG 1R 1,048 1,068 CCATTGAAGGTGTCAAATTTC 2 2F 706 729 CGAGCTTGGCACTGATCCTTA 2R 1,398 1,419 GCAAGACTATGCTCAGGTCCTA 3 3F 950 970 TACTGCTGCCGTGAACATGAG 3R 2,183 2,203 CCAACCGTCTCTAAGAAACTC 4 4F 1,999 2,020 GAGACTCATTGATGCTATGATG 4R 3,099 3,120 TCAGTACCATACTCATATTGAG 5 5F 2,352 2,374 GTGGAGCTAAACTTAAAGCCTTG 5R 3,452 3,473 CTCCTCCATGTTTAAGGTAAAC 6 6F 2,846 2,865 ACAGTTGAACTCGGTACAGA 6R 4,068 4,088 CAATGTCACTAACAAGAGTGG 7 7F 3,884 3,904 CCTAAAGAGGAAGTTAAGCCA 7R 5,153 5,172 TGGTAGTACTCAAAAGCCTC 8 8F 4,787 4,807 GCTGGTTCCTATAAAGATTGG 8R 6,146 6,165 ACATCACCATTTAAGTCAGG 9 9F 5,976 5,997 ATTCTTATTTCACAGAGCAACC 9R 7,178 7,200 GAAATGGTAATTTGTATAGTTTC 10 10F 6,977 6,999 GTTTGCCTAGGTTCTTTAATCTA 10R 8,183 8,204 CTACATCTGAATCAACAAACCC 11 11F 7,985 8,006 CAGGCATTAGTGTCTGATGTTG 11R 9,167 9,188 CTCTAACAGAACCTTCAAGGTA 12 12F 8,966 8,986 AAACTTATAGAGTACACTGAC 12R 10,166 10,185 CAGATCACATGTCTTGGACA 13 13F 9,900 9,921 ATAAGTACAAGTATTTTAGTGG 13R 11,114 11,133 GCAGACATAGCAATAATACC 14 14F 10,901 10,922 GGTAGTGCTTTATTAGAAGATG 14R 12,175 12,196 AAGAACAACTTCAGAATCACCA 15 15F 12,024 12,043 CCATGCAGGGTGCTGTAGAC 15R 13,205 13,225 GGATTCTTGATCCATATTGGC 16 16F 12,970 12,991 CAACCTAAATAGAGGTATGGTA 16R 14,290 14,312 TCCCAATATTTAAAATAACGGTC 17 17F 13,775 13,795 CACATATATCACGTCAACGTC 17R 14,999 15,019 GTGCATCTTGATCCTCATAAC 18 18F 14,756 14,777 ACTTCTTCTTTGCTCAGGATGG 18R 15,989 16,011 TTCAATCATAAGTGTACCATCTG 19 19F 15,929 15,850 GCAAAATGTTGGACTGAGACTG 19R 17,014 17,036 GCAACATTGCTAGAAAACTCATC 20 20F 16,832 16,853 CCTTTGAAAAAGGTGACTATGG 20R 17,956 17,977 GGTCTCTATCAGACATTATGCA 21 21F 17,530 17,549 ATAGGTCCAGACATGTTCCT 21R 18,781 18,801 TTGTAGGTTACCTGTAAAACC 22 22F 18,487 18,506 ATACCACTTATGTACAAAGG 22R 19,618 19,639 AAGCCACATTTTCTAAACTCTG 23 23F 19,438 19,459 CCACTAAAGTCTGCTACGTGTA 23R 20,568 20,589 GTCAATAGTCACTTTGACAACC 24 24F 20,363 20,384 TACATCTACTGATTGGACTAGC 24R 21,658 21,678 GGGTAATAAACACCACGTGTG 25 25F 19,828 19,850 AAAATACTCAATAATTTGGGTGT 25R 21,019 21,041 ATAATGAGATCCCATTTATTAGC 26 26F 20,428 20,447 CCTATGGACAGTACAGTTAA 26R 21,665 21,684 TTGTCAGGGTAATAAACACC 27 27F 21,332 21,354 ATGCAAATTACATATTTTGGAGG 27R 22,539 22,560 GTAATATTAGGAAATCTAACAA 28 28F 22,433 22,449 TGTGCACTTGACCCTCT 28R 23,345 23,364 CCTGGTGTTATAACACTGAC 29 29F 23,123 23,142 CCAGCAACTGTTTGTGGACC 29R 24,095 24,116 CACAAATGAGGTCTCTAGCAGC 30 30F 23,339 23,360 GGTGGTGTCAGTGTTATAACAC 30R 24,328 24,349 ACTATTAAATTGGTTGGCAATC 31 31F 23,948 23,971 GATTTTGGTGGTTTTAATTTTTCA 31R 25,157 25,176 TTTCCAAGTTCTTGGAGATC 32 32F 24,960 24,981 TCAACAACACAGTTTATGATCC 32R 26,171 26,192 GGTTCATCATAAATTGGTTCCA 33 33F 25,837 25,857 TGGCATACTAATTGTTAYGAC 33R 27,033 27,052 GAAAGCGTTCGTGATGTAGC 34 34F 26,815 26,834 CTTCTTTCAGACTGTTTGCG 34R 27,948 27,968 ACATGACTGTAAACTACATTC 35 35F 27,389 27,406 CGAACATGAAAATTATTC 35R 28,550 28,568 CGTCACCACCACGAATTCG 36 36F 28,322 28,341 TTTGGTGGACCCTCAGATTC 36R 29,543 29,561 CCATCTGCCTTGTGTGGTC 37 37F 29,149 29,170 CAGACAAGGAACTGATTACAAA 37R 29,836 29,857 GAAGCTATTAAAATCACATGGG 38 38F 26,539 26,559 GTACTATTACCGTTGAAGAGC 38R 27,102 27,122 CCTGTAGCGACTGTATGCAGC 5′-RACE 39R 1,048 1,068 CCATTGAAGGTGTCAAATTTC 40R 493 512 GACCATGAGGTGCAGTTCGA 3′-RACE 41F 29,149 29,170 CAGACAAGGAACTGATTACAAA 42F 29,438 29,459 CAGCAAACTGTGACTCTTCTTC * The location on the reference genome, accession ID: EPI_ISL_402120. Table 1. Primers used for whole-genome sequencing of the COVID-19 virus.
-
The 5’ and 3’ ends of the genome were determined by rapid amplification of cDNA ends (RACE) using the Invitrogen 5’ RACE System and 3’ RACE System (Invitrogen, USA) according to the manufacturer’s instructions. Gene-specific primers for 5’ and 3’ RACE PCR amplification were designed to obtain a fragment of approximately 400–500 bp for the two regions. Purified PCR products were cloned into the pMD18-T Simple Vector (TaKaRa, Takara Biotechnology, Dalian, China) and chemically competent Escherichia coli (DH5α cells, TaKaRa), according to the manufacturer’s instructions. PCR products were sequenced with use of M13 forward and reverse primers.
-
All sequencing fragments were assembled using DNAStar software. The open reading frames of the verified genome sequences were predicted using Geneious (version 11.1.5) and annotated using the Conserved Domain Database. Sequence alignment of the COVID-19 virus with reference sequences was done with Mafft software (version 7.450). The SNPs of each sequence were defined as the site’s variant from the reference sequence.
HTML
Clinical Samples
Nucleic Acid Extraction and Fragment Amplification
5’ and 3’ Ends of Genome Sequencing
Genome Sequence Assembly
Citation: |