-
Human immunodeficiency virus 1 (HIV-1) is known for its frequent mutation and recombination, which contribute to the genetic diversity of the virus (1). Recombination between subtypes of HIV-1 generates unique recombinant forms (URFs) in individuals, some of which become circulating recombinant forms (CRFs) once they circulate in the population. Yunnan Province was the first location in China where the main HIV-1 strains were found. Due to the long-term HIV-1 epidemic, various recombinant forms were generated, including the predominant HIV-1 CRFs currently circulating in China (2). In a previous study, this study’s authors found five samples that differed from known subtypes/CRFs. To determine whether they were potential CRFs and how they arose, the authors amplified and sequenced the near-full-length genome (NFLG) sequences and performed phylogenetic, recombination, and evolutionary analyses. Phylogenetic analysis revealed that these five sequences clustered in a distinct monophyletic clade distantly related to all known HIV-1 CRFs. Recombination analysis revealed that they shared a subtype C backbone with four subtype B insertions, forming nine subregions. Phylogenetic analyses of the subregions confirmed the parental lineage of each subregion. According to the naming criteria, the strains were named CRF142_BC. Bayesian evolutionary analysis revealed that the time of origin of CRF142_BC was approximately 1994–1997. The identification of new CRFs will provide a basis for the molecular tracing of HIV-1 in China, as well as for the study of HIV mutations and vaccines.
Four of the five samples (15R176, 21ZT314, 21ZT323, and 21ZT334) were collected from Zhaotong City, and one (15R297) was collected from Pu'er City. Two samples (15R176 and 15R297) were collected in 2015, and the other three (21ZT314, 21ZT323, and 21ZT334) were collected in 2021. Demographic information is presented in Table 1. No epidemiological link was found between the five individuals. Written informed consent was obtained from the participants. NFLG sequences were amplified and sequenced as previously described (3). Amplified products were sent to SinoGenoMax Co. (Beijing, China) for Sanger sequencing. Phylogenetic analyses were performed using MEGA 11. Recombination breakpoints were analyzed using SimPlot 3.5.1 software. The time of the most recent common ancestor (tMRCA) of CRF142_BC was estimated by Bayesian Markov chain Monte Carlo (MCMC) analysis using BEAST v 1.8.2. The uncorrelated lognormal relaxed molecular clock was used in combination with the Bayesian skyline coalescent tree priors under the GTR+I+G4 nucleotide substitution model.
Sequence name Sampling year Sex Age Ethnic group Education Marital status Infection route 15R176 2015 Female 59 Han Illiterate Married Heterosexual contact 15R297 2015 Male 32 Hani Primary school Devoiced/Widowed Heterosexual contact 21ZT314 2021 Male 75 Han Illiterate Married Heterosexual contact 21ZT323 2021 Female 64 Miao Illiterate Devoiced/Widowed Heterosexual contact 21ZT334 2021 Male 36 Han Primary school Devoiced/Widowed Heterosexual contact Abbreviation: HIV-1=human immunodeficiency virus 1. Table 1. Demographic characteristics of the HIV-1-infected participants.
The NFLG sequences of strains 15R176, 15R297, 21ZT314, 21ZT323, and 21ZT334 were 8,787 (640–9,465 in HXB2), 8,795 (670–9,480 in HXB2), 8,814 (635–9,513 in HXB2), 8,924 (637–9,571 in HXB2), and 8,958 (640–9,558 in HXB2) nt in size, respectively, ranging from the 5’ noncoding region (NCR) to part of the 3’ long terminal repeat (LTR). They were submitted to GenBank under accession numbers PP074169–PP074173.
Phylogenetic analysis revealed that these five sequences formed a distinct monophyletic clade with a bootstrap value of 100%, distantly related to all other HIV-1 subtypes/CRFs (Figure 1A), suggesting they may represent a potential novel CRF. Recombination analysis revealed that these five sequences were composed of subtypes B and C and had similar recombination patterns (Figure 1B). Four segments of subtype B were inserted into the backbone of subtype C, resulting in a mosaic structure of nine subregions: IC (640-1,555), IIB (1,556-1,690), IIIC (1,691-2,885), IVB (2,886-3,141), VC (3,142-6,035), VIB (6,036-6,178), VIIC (6,179-8,865), VIIIB (8,866-9,047) and IXC (9,048-9,513). Phylogenetic analysis of the nine subregions revealed that subregions I, III, V, VII, and IX clustered with their subtype C counterparts, and subregions II, IV, VI, and VIII clustered with their subtype B counterparts (Figure 2A). These new recombinants are, therefore, designated CRF142_BC. As shown in Figure 1B, compared to CRF08_BC, CRF142_BC had a shorter IIB subregion and one more subtype B insertion fragment (VIB subregion).
Figure 1.Phylogenetic and recombinant analyses based on the near-full-length genome sequence of CRF142_BC. (A) The neighbor-joining phylogenetic tree of the representative HIV-1 CRFs reference sequences. (B) Bootscanning analysis of CRF142_BC, CRF08_BC, and CRF07_BC. (C) Genomic structure of CRF142_BC.
Note: For (A), the sequences of the potential novel CRFs (15R176, 15R297, 21ZT314, 21ZT323, and 21ZT334) are marked in red. The values on the branches represent the percentages of 1,000 bootstrap replicates. The scale bar indicates 5% nucleotide sequence divergence. For (B), conditions used for this analysis were as follows: window: 300 bp, step: 30 bp, GapStrip: on, replicates: 100, Kimura (2-parameter), T/t: 2.0. The Subtype C reference group included AF067155, AF067158, and AF067157. The Subtype B reference group included AY173951, JF932495, and JF932496. The CRF08_BC reference group included KC914396, HM067748, and AY008715. The CRF07_BC reference group included EF368372, EF368370, and AF286230. The reference group of Subtype A4 included AM000053 and AM000054. For (C), the mosaic map was generated using the Recombinant HIV-1 Drawing Tool.
Abbreviation: CRF=circulating recombinant form; LTR= long terminal repeat.
Figure 2.Phylogenetic and evolutionary analysis of subregions from CRF142_BC. (A) Maximum likelihood trees of the nine mosaic fragments identified by recombination analysis. (B) The maximum clade credibility (MCC) trees of the combined subtype C subregions (I+III+V+VII+IX) and the combined subtype B subregions (II+IV+VI+VIII) from CRF142_BC.
Note: For (A), the reliability of the tree branches was assessed by 1,000 bootstrap replicates. The scale bar indicates the nucleotide sequence divergence.
Abbreviation: CRF=circulating recombinant form.
To explore the evolutionary history of CRF142_BC, Bayesian evolutionary analysis was performed with combined subtype C regions (I+III+V+VII+IX) and combined subtype B regions (II+IV+VI+VIII). As shown in Figure 2B, the median tMRCAs of the combined subtype C and subtype B regions were 1994.2 [95% highest probability density (HPD): 1989.3–1998.4] and 1997.7 (95% HPD: 1976.4–2007.6), respectively, suggesting that CRF142_BC originated between approximately 1994 and 1997. The analysis also showed that the subtype B and subtype C segments were most likely from the Thai B and India C lineages, respectively.
HTML
Citation: |