Complementing genome sequence with deep transcriptome and proteome data could enable more accurate assembly and annotation of newly sequenced genomes. Here, we provide a proof-of-concept of an integrated approach for analysis of the genome and proteome of Anopheles stephensi, which is one of the most important vectors of the malaria parasite. To achieve broad coverage of genes, we carried out transcriptome sequencing and deep proteome profiling of multiple anatomically distinct sites. Based on transcriptomic data alone, we identified and corrected 535 events of incomplete genome assembly involving 1196 scaffolds and 868 protein-coding gene models. This proteogenomic approach enabled us to add 365 genes that were missed during genome annotation and identify 917 gene correction events through discovery of 151 novel exons, 297 protein extensions, 231 exon extensions, 192 novel protein start sites, 19 novel translational frames, 28 events of joining of exons, and 76 events of joining of adjacent genes as a single gene. Incorporation of proteomic evidence allowed us to change the designation of more than 87 predicted “noncoding RNAs” to conventional mRNAs coded by protein-coding genes. Importantly, extension of the newly corrected genome assemblies and gene models to 15 other newly assembled Anopheline genomes led to the discovery of a large number of apparent discrepancies in assembly and annotation of these genomes. Our data provide a framework for how future genome sequencing efforts should incorporate transcriptomic and proteomic analysis in combination with simultaneous manual curation to achieve near complete assembly and accurate annotation of genomes.
Anopheles gambiae is a major mosquito vector responsible for malaria transmission, whose genome sequence was reported in 2002. Genome annotation is a continuing effort, and many of the approximately 13,000 genes listed in VectorBase for Anopheles gambiae are predictions that have still not been validated by any other method. To identify protein-coding genes of An. gambiae based on its genomic sequence, we carried out a deep proteomic analysis using high-resolution Fourier transform mass spectrometry for both precursor and fragment ions. Based on peptide evidence, we were able to support or correct more than 6000 gene annotations including 80 novel gene structures and about 500 translational start sites. An additional validation by RT-PCR and cDNA sequencing was successfully performed for 105 selected genes. Our proteogenomic analysis led to the identification of 2682 genome search–specific peptides. Numerous cases of encoded proteins were documented in regions annotated as intergenic, introns, or untranslated regions. Using a database created to contain potential splice sites, we also identified 35 novel splice junctions. This is a first report to annotate the An. gambiae genome using high-accuracy mass spectrometry data as a complementary technology for genome annotation.
BackgroundIndia contributes 1.5–2 million annual confirmed cases of malaria. Since both parasites and vectors are evolving rapidly, updated information on parasite prevalence in mosquitoes is important for vector management and disease control. Possible new vector-parasite interactions in Goa, India were tested.MethodsA total of 1036 CDC traps were placed at four malaria endemic foci in Goa, India from May 2013 to April 2015. These captured 23,782 mosquitoes, of which there were 1375 female anopheline specimens with ten species identified using morphological keys. Mosquito DNA was analysed for human and bovine blood as well as for Plasmodium falciparum and Plasmodium vivax infection.ResultsHuman host feeding was confirmed in Anopheles stephensi (30 %), Anopheles subpictus (27 %), Anopheles jamesii (22 %), Anopheles annularis (26 %), and Anopheles nigerrimus (16 %). In contrast, Anopheles vagus, Anopheles barbirostris, Anopheles tessellates, Anopheles umbrosus and Anopheles karwari specimens were negative for human blood. Importantly, An. subpictus, which was considered a non-vector in Goa and Western India, was found to be a dominant vector in terms of both total number of mosquitoes collected as well as Plasmodium carriage. Plasmodium infections were detected in 14 An. subpictus (2.8 %), while the traditional vector, An. stephensi, showed seven total infections, two of which were in the salivary glands. Of the 14 An. subpictus infections, nested PCR demonstrated three Plasmodium infections in the salivary glands: one P. vivax and two mixed infections of P. falciparum and P. vivax. In addition, ten gut infections (one P. vivax, six P. falciparum and three mixed infections) were seen in An. subpictus. Longitudinal mosquito collections pointed to a bimodal annual appearance of An. subpictus to maintain a perennial malaria transmission cycle of both P. vivax and P. falciparum in Goa.
Malaria is a vector-borne disease causing extensive morbidity, debility and mortality. Development of resistance to drugs among parasites and to conventional insecticides among vector-mosquitoes necessitates innovative measures to combat this disease. Identification of molecules involved in the maintenance of complex developmental cycles of the parasites within the vector and the host can provide attractive targets to intervene in the disease transmission. In the last decade, several efforts have been made in identifying such molecules involved in mosquito-parasite interactions and, subsequently, validating their role in the development of parasites within the vector. In this study, a list of mosquito proteins, which facilitate or inhibit the development of malaria parasites in the midgut, haemolymph and salivary glands of mosquitoes, is compiled. A total of 94 molecules have been reported and validated for their role in the development of malaria parasites inside the vector. This compendium of molecules will serve as a centralized resource to biomedical researchers investigating vector-pathogen interactions and malaria transmission.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.