-
Numerous databases cataloging the genomic diversity of malaria parasites have been developed recently (1–2). These resources enable comprehensive analysis of genetic variation within hosts, superinfection, co-transmission, identification and monitoring of drug resistance loci, and understanding of malaria transmission dynamics. Although these databases support advanced analytical methods for examining epidemiological changes and identifying distinct regional epidemiological features, they do not encompass detailed data for every endemic country. In West Africa, Sierra Leone experiences year-round malaria transmission primarily due to Plasmodium falciparum (P. falciparum). Despite falling short of the 2025 target set by the Global Technical Strategy for Malaria 2016–2030, Sierra Leone recorded a decrease in malaria case incidence in 2022 compared to 2012. Specifically, malaria cases decreased from 2.7 million in 2019 to 2.6 million in 2022 despite the challenges posed by the coronavirus disease 2019 (COVID-19) pandemic (3). However, with a population of 8.6 million, the annual burden of 2.6 million cases remains excessively high. Additionally, until now, there has been no report of malaria genome data from the country, which limits the integration of genomic information with epidemiological and public health data in national malaria control programs and impedes genomic surveillance efforts. In this study, we present the first report of whole-genome sequence variations in 19 high parasite-density P. falciparum isolates from Sierra Leone, offering insights into the genomic epidemiology of an area with high prevalence. Our findings open avenues for further studies into the molecular epidemiology of a population characterized by a high rate of superinfection and a low correlation between infections. Additionally, we observed a reduced diversity in drug resistance genes in Sierra Leone samples, suggesting that the country’s unstructured antimalarial drug policies have subjected the parasite population to increased directional selection.
Blood samples were collected from Freetown, the capital of Sierra Leone, between December 2022 and March 2023. This study received ethical approval from the Ethics Committee of Chinese Center for Disease Control and Prevention. Participants were fully informed about the study’s procedures, potential risks, and benefits before providing their written informed consent. Samples were taken from hundreds of clinical malaria patients at the Sierra Leone China Friendship Hospital (Freetown), all confirmed to be microscopically positive and polymerase chain reaction (PCR) verified for P. falciparum infection.
To ensure sequencing integrity, we selected samples with high parasite density. DNA was extracted from these frozen blood samples using the QIAGEN DNeasy Blood & Tissue Kit (Qiagen, UK), and sequencing libraries were constructed accordingly. These libraries were then sequenced on an MGISEQ-200 machine (BGI, CN), generating an average of 220 million paired-end reads, each 150 base pairs in length (Table 1). The sequenced reads underwent cleaning to remove adapter sequences, and low-quality reads using the Trimmomatic-3.0 tool and were then aligned to the annotated P. falciparum 3D7 reference genome, sourced from PlasmoDB.org, using BWA. Following the application of the GATK4 workflows, we refined our results to a set of 165,000 high-quality single nucleotide polymorphisms (SNPs) (4). Our workflow encompassed pre-processing steps such as marking duplicates, indel realignment, and base quality score recalibration, along with variant discovery using the HaplotypeCaller tools for SNPs and indels calling, and variant filtering based on various quality metrics. SNPs that exhibited more than 5% missing calls in high-quality samples were excluded. Missing calls were defined as positions with fewer than two reads. We utilized a dataset of genotype calls released in 2016, comprising over 3,000 samples from various countries globally. SNP information and allele frequencies were retrieved from the pf3k Project. The complexity of infection within the hosts was assessed by calculating the within-sample F statistic value (FWS), using the bahlolab/moimix R package. FWS values for individual infections ranged from 0.6 to 0.94, with a mean of 0.84 and a median of 0.86. An FWS value below 0.95 indicates the presence of multiple genotypes within an infection. Our analysis demonstrated a comparably lower frequency of monoclonal infections among the samples from Sierra Leone.
Sample ID RmDup_reads Mapped ratio (%) Average coverage depth (X) Coverage >1x (%) Coverage >10x (%) FWS DL_34 41846007 9.86 9.044 91.21 48.51 0.763 PFGX_162 25515745 1.92 11.272 92.74 21.93 0.819 29104_57-73 31073980 7.77 12.586 90.29 13.93 0.942 29104_58-74 43909234 9.67 4.062 93.27 28.95 0.866 29104_59-75 39891198 10.07 6.010 89.12 8.47 0.755 29104_63-76 32070741 8.62 12.682 94.17 14.78 0.859 7509_81-86 30206296 1.12 10.461 92.43 20.68 0.865 7509_84 50154284 11.02 13.275 90.70 13.2 0.888 Ah_7 58998903 5.10 7.170 95.99 15.39 0.859 D_Pf5_1 36781350 8.58 13.164 91.80 25.82 0.764 sierra131 37367263 26.79 45.902 99.49 96.72 0.602 sierra135 33004273 9.62 6.190 92.92 13.77 0.932 sierra136 44491835 15.65 21.889 98.55 86.06 0.867 sierra146 56873546 7.47 10.314 92.51 39.85 0.808 sierra148 33913056 12.17 11.799 97.47 53.28 0.900 sierra55 30366090 10.97 7.691 96.06 23.62 0.881 sierra57 45635899 9.91 9.566 97.11 34.93 0.892 sierra60 42761031 13.03 10.208 97.11 38.07 0.857 sierra79 37871554 10.25 5.094 90.99 9.99 0.939 Table 1. Sequencing and mapping summary statistics for 19 samples from Sierra Leone.
We then evaluated the population structure using the pf3k global collection (5) and our 19 high-coverage, single-infection isolates. We estimated nucleotide diversity (
$ \widehat{\text{π}} $ ), Watterson’s estimator ($ \widehat{\theta } $ ω), genetic differentiation (FST), and Tajima’s D value for the 5,600 genes located on chromosomes, employing ARLEQUIN-Ver3.5 (6). Principal component analysis (PCA) was performed using the R package. A neighbor-joining (NJ) tree was constructed with MEGA6 software. We reviewed lists of putative resistance genes from previous studies and assessed the selection pressure on these genes in Sierra Leone compared to other reference populations. We also investigated gene families involved in merozoite invasion and modulation of the immune response.Among the numerous samples collected in Freetown, only 19 sequencing results achieved satisfactory coverage, meeting the required standards to date. This study generated between 176 and 245 million paired-end reads, each with an average length of 150 base pairs. Notably, the samples from Sierra Leone showed a higher frequency of SNPs compared with those from other regions. A PCA of SNP variants effectively revealed the global population structure (Figure 1A), indicating that P. falciparum predominantly clusters according to geographic origin, with African samples notably distinct from Asian ones. The primary axes of differentiation grouped the Sierra Leone samples together, aligning them with other African samples, a similarity also evident in the NJ tree (Figure 1B). Further, we analyzed the chromosome-level structure in the CMB imported isolates using the identity by descent (IBD) fraction, a metric typically used to explore the relatedness among closely related parasite populations (7). The Sierra Leone samples displayed the highest pairwise IBD fractions (mean value=0.04) among pf3k populations, although these values remained relatively low, suggesting a high degree of transmission within the region.
Figure 1.Parasite population structure in Sierra Leone samples compared to references (Cambodia, Ghana, Guinea, Mali, Bangladesh, Malawi, Laos, Senegal, DR Congo, Myanmar, Thailand, Vietnam, and the Gambia). (A) PCA plots showcasing genetic differentiation among populations; (B) Neighbor-joining tree depicting the relatedness of Sierra Leone isolates to reference populations, supported by 1,000 bootstraps.
Contrary to the results obtained from PCA and phylogenetic trees, the genetic diversity and balancing selection of genes did not correlate with geographical origin. The genetic diversity in isolates from Sierra Leone (π=0.0028) exceeded that in the global P. falciparum sample (π approximately 1.03×10-3), which includes isolates from diverse regions such as Africa, America, Asia, and Oceania. Gene families such as rifin, var, stevor, surfin, and pfmc exhibited the highest diversity, consistent with previous studies (8) (Figure 2A). Notably, most of Tajima’s D values were negative, averaging −1.02 with a median of −1.16, suggesting an excess of low frequency polymorphisms indicative of either recent population expansion or purifying selection. This heightened diversity, particularly in gene families related to erythrocyte invasion and immune evasion, highlights the parasite’s adaptive strategies in facing local environmental pressures and treatment regimes. Nevertheless, only a small fraction of genes, specifically 397 (7%), showed positive Tajima’s D values, which are typically associated with reduced diversity due to directional selection. In contrast, Guinea, despite also being a West African population, displayed 1,495 genes (26%) with positive Tajima’s D values and a higher mean value (−0.38) but lower genetic diversity (π=0.0016). These lower negative Tajima’s D values are consistent with past findings, indicating recent population reduction and declining infection rates.
Figure 2.Genetic diversity and balanced selection of 5,600 genes in Sierra Leone and reference populations. (A) Relatively lower diversity of resistance genes; (B) Tajima’s D value indicated that African samples exhibit clustered types and less selection pressure than Asian samples; (C) Compared to reference populations, Sierra Leone samples exhibit reduced diversity in drug resistance genes; (D) Whole-genome scans reveal that the gene families associated with invasion are also under directional selection.
Drawing from previous studies, we compiled a comprehensive list of potential drug-resistance genes (9-10). We anticipated that populations in Sierra Leone would exhibit higher overall selective pressure and lower genetic diversity compared to Asian populations, as indicated by Tajima’s D test, particularly in genes resistant to artemisinin-based combination therapies (ACTs) (Figure 2B–C). For antifolate resistance genes, positive balancing selective pressures were observed in the Southeast Asian populations, in contrast to the negative balancing selection found in Africa. This difference could be due to the nearly two-decade-long discontinuation of these drugs in Southeast Asia. Further analysis revealed distinctive pressure differences between the Sierra Leone population and reference populations. Notably, specific genes, such as the pantothenamide resistance gene (acs11) and those associated with imidazole piperazine resistance (e.g., ugt and krs1), continued to face selection pressure in Sierra Leone but showed positive balancing selection in Asia (11-12). These results underscore the complex and evolving nature of drug resistance patterns across different parasitic populations.
HTML
Citation: |