We present the DeNovoGear software for analyzing de novo mutations from familial and somatic tissue sequencing data. DeNovoGear uses likelihood-based error modeling to reduce the false positive rate of mutation discovery in exome analysis, and fragment information to identify the parental origin of germline mutations. We used our program to create a whole-genome de novo indel callset with a 95% validation rate, producing a direct estimate of the human germline indel mutation rate.
Modern analytical methods for population genetics and phylogenetics are expected to provide more accurate results when data from multiple genome-wide loci are analysed. We present the results of an initial application of parallel tagged sequencing (PTS) on a next-generation platform to sequence thousands of barcoded PCR amplicons generated from 95 nuclear loci and 93 individuals sampled across the range of the tiger salamander (Ambystoma tigrinum) species complex. To manage the bioinformatic processing of this large data set (344 330 reads), we developed a pipeline that sorts PTS data by barcode and locus, identifies high-quality variable nucleotides and yields phased haplotype sequences for each individual at each locus. Our sequencing and bioinformatic strategy resulted in a genome-wide data set with relatively low levels of missing data and a wide range of nucleotide variation. structure analyses of these data in a genotypic format resulted in strongly supported assignments for the majority of individuals into nine geographically defined genetic clusters. Species tree analyses of the most variable loci using a multi-species coalescent model resulted in strong support for most branches in the species tree; however, analyses including more than 50 loci produced parameter sampling trends that indicated a lack of convergence on the posterior distribution. Overall, these results demonstrate the potential for amplicon-based PTS to rapidly generate large-scale data for population genetic and phylogenetic-based research.
Leishmania, a genus of parasites transmitted to human hosts and mammalian/reptilian reservoirs by an insect vector, is the causative agent of the human disease complex leishmaniasis. The evolutionary relationships within the genus Leishmania and its origins are the source of ongoing debate, reflected in conflicting phylogenetic and biogeographic reconstructions. This study employs a recently described bioinformatics method, SISRS, to identify over 200,000 informative sites across the genome from newly sequenced and publicly available Leishmania data. This dataset is used to reconstruct the evolutionary relationships of this genus. Additionally, we constructed a large multi-gene dataset, using it to reconstruct the phylogeny and estimate divergence dates for species. We conclude that the genus Leishmania evolved at least 90-100 million years ago, supporting a modified version of the Multiple Origins hypothesis that we call the Supercontinent hypothesis. According to this scenario, separate Leishmania clades emerged prior to, and during, the breakup of Gondwana. Additionally, we confirm that reptile-infecting Leishmania are derived from mammalian forms and that the species that infect porcupines and sloths form a clade long separated from other species. Finally, we firmly place the guinea-pig infecting species, Leishmaniaenriettii, the globally dispersed Leishmaniasiamensis, and the newly identified Australian species from a kangaroo, as sibling species whose distribution arises from the ancient connection between Australia, Antarctica, and South America.
Plasmodium vivax is the most prevalent malarial species in South America and exerts a substantial burden on the populations it affects. The control and eventual elimination of P. vivax are global health priorities. Genomic research contributes to this objective by improving our understanding of the biology of P. vivax and through the development of new genetic markers that can be used to monitor efforts to reduce malaria transmission. Here we analyze whole-genome data from eight field samples from a region in Cordóba, Colombia where malaria is endemic. We find considerable genetic diversity within this population, a result that contrasts with earlier studies suggesting that P. vivax had limited diversity in the Americas. We also identify a selective sweep around a substitution known to confer resistance to sulphadoxine-pyrimethamine (SP). This is the first observation of a selective sweep for SP resistance in this species. These results indicate that P. vivax has been exposed to SP pressure even when the drug is not in use as a first line treatment for patients afflicted by this parasite. We identify multiple non-synonymous substitutions in three other genes known to be involved with drug resistance in Plasmodium species. Finally, we found extensive microsatellite polymorphisms. Using this information we developed 18 polymorphic and easy to score microsatellite loci that can be used in epidemiological investigations in South America.
BackgroundEstimates of divergence dates between species improve our understanding of processes ranging from nucleotide substitution to speciation. Such estimates are frequently based on molecular genetic differences between species; therefore, they rely on accurate estimates of the number of such differences (i.e. substitutions per site, measured as branch length on phylogenies). We used simulations to determine the effects of dataset size, branch length heterogeneity, branch depth, and analytical framework on branch length estimation across a range of branch lengths. We then reanalyzed an empirical dataset for plethodontid salamanders to determine how inaccurate branch length estimation can affect estimates of divergence dates.ResultsThe accuracy of branch length estimation varied with branch length, dataset size (both number of taxa and sites), branch length heterogeneity, branch depth, dataset complexity, and analytical framework. For simple phylogenies analyzed in a Bayesian framework, branches were increasingly underestimated as branch length increased; in a maximum likelihood framework, longer branch lengths were somewhat overestimated. Longer datasets improved estimates in both frameworks; however, when the number of taxa was increased, estimation accuracy for deeper branches was less than for tip branches. Increasing the complexity of the dataset produced more misestimated branches in a Bayesian framework; however, in an ML framework, more branches were estimated more accurately. Using ML branch length estimates to re-estimate plethodontid salamander divergence dates generally resulted in an increase in the estimated age of older nodes and a decrease in the estimated age of younger nodes.ConclusionsBranch lengths are misestimated in both statistical frameworks for simulations of simple datasets. However, for complex datasets, length estimates are quite accurate in ML (even for short datasets), whereas few branches are estimated accurately in a Bayesian framework. Our reanalysis of empirical data demonstrates the magnitude of effects of Bayesian branch length misestimation on divergence date estimates. Because the length of branches for empirical datasets can be estimated most reliably in an ML framework when branches are <1 substitution/site and datasets are ≥1 kb, we suggest that divergence date estimates using datasets, branch lengths, and/or analytical techniques that fall outside of these parameters should be interpreted with caution.
BackgroundImprovements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation.ResultsFor simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets.ConclusionsSISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is open source and freely available at https://github.com/rachelss/SISRS/releases.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0632-y) contains supplementary material, which is available to authorized users.
New populations of threatened species are often established as a conservation measure. However, if only a few individuals contribute to subsequent generations, these populations may have limited genetic diversity. Such genetic bottlenecks can result in inbreeding depression, reduced fitness, and even extirpation of populations. Eight isolated populations of Sacramento perch Archoplites interruptus established through anthropogenic translocations were examined for evidence of genetic bottlenecks. Sacramento perch are endemic to two regions of California but have been entirely extirpated from their native range; the remaining populations are essential for conservation of the species. Using 12 microsatellite DNA loci, we determined that genetic bottlenecks occurred in six of the populations. Allelic richness, richness of private alleles, and effective population size differed significantly among populations. Strong differentiation among the extant populations probably resulted from differences in the sources used to establish the populations and from genetic drift due to the small population sizes. These results indicate that genetic bottlenecks are frequent when new, isolated populations of a species are established. Although these extant populations have persisted despite bottlenecks, future Sacramento perch populations should be established by drawing from the most diverse of the current populations and should be monitored with genetic markers to evaluate diversity and the possible need for further stocking. We combine three measures of genetic diversity (allelic richness, private allelic richness, and effective population size) to recommend potential source populations.
BackgroundBlindness has evolved repeatedly in cave-dwelling organisms, and many hypotheses have been proposed to explain this observation, including both accumulation of neutral loss-of-function mutations and adaptation to darkness. Investigating the loss of sight in cave dwellers presents an opportunity to understand the operation of fundamental evolutionary processes, including drift, selection, mutation, and migration.ResultsHere we model the evolution of blindness in caves. This model captures the interaction of three forces: (1) selection favoring alleles causing blindness, (2) immigration of sightedness alleles from a surface population, and (3) mutations creating blindness alleles. We investigated the dynamics of this model and determined selection-strength thresholds that result in blindness evolving in caves despite immigration of sightedness alleles from the surface. We estimate that the selection coefficient for blindness would need to be at least 0.005 (and maybe as high as 0.5) for blindness to evolve in the model cave-organism, Astyanax mexicanus.ConclusionsOur results indicate that strong selection is required for the evolution of blindness in cave-dwelling organisms, which is consistent with recent work suggesting a high metabolic cost of eye development.Electronic supplementary materialThe online version of this article (doi:10.1186/s12862-017-0876-4) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.