Transposable element (TE) insertions are among the most challenging types of variants to detect in genomic data because of their repetitive nature and complex mechanisms of replication . Nevertheless, the recent availability of large resequencing data sets has spurred the development of many new methods to detect TE insertions in whole-genome shotgun sequences. Here we report an integrated bioinformatics pipeline for the detection of TE insertions in whole-genome shotgun data, called McClintock (https://github.com/bergmanlab/mcclintock), which automatically runs and standardizes output for multiple TE detection methods. We demonstrate the utility of McClintock by evaluating six TE detection methods using simulated and real genome data from the model microbial eukaryote, Saccharomyces cerevisiae. We find substantial variation among McClintock component methods in their ability to detect nonreference TEs in the yeast genome, but show that nonreference TEs at nearly all biologically realistic locations can be detected in simulated data by combining multiple methods that use split-read and read-pair evidence. In general, our results reveal that split-read methods detect fewer nonreference TE insertions than read-pair methods, but generally have much higher positional accuracy. Analysis of a large sample of real yeast genomes reveals that most McClintock component methods can recover known aspects of TE biology in yeast such as the transpositional activity status of families, target preferences, and target site duplication structure, albeit with varying levels of accuracy. Our work provides a general framework for integrating and analyzing results from multiple TE detection methods, as well as useful guidance for researchers studying TEs in yeast resequencing data.
Transposable element (TE) insertions are among the most challenging types of variants to detect in genomic data because of their repetitive nature and complex mechanisms of replication . Nevertheless, the recent availability of large resequencing data sets has spurred the development of many new methods to detect TE insertions in whole-genome shotgun sequences. Here we report an integrated bioinformatics pipeline for the detection of TE insertions in whole-genome shotgun data, called McClintock (https://github. com/bergmanlab/mcclintock), which automatically runs and standardizes output for multiple TE detection methods. We demonstrate the utility of McClintock by evaluating six TE detection methods using simulated and real genome data from the model microbial eukaryote, Saccharomyces cerevisiae. We find substantial variation among McClintock component methods in their ability to detect nonreference TEs in the yeast genome, but show that nonreference TEs at nearly all biologically realistic locations can be detected in simulated data by combining multiple methods that use split-read and read-pair evidence. In general, our results reveal that split-read methods detect fewer nonreference TE insertions than read-pair methods, but generally have much higher positional accuracy. Analysis of a large sample of real yeast genomes reveals that most McClintock component methods can recover known aspects of TE biology in yeast such as the transpositional activity status of families, target preferences, and target site duplication structure, albeit with varying levels of accuracy. Our work provides a general framework for integrating and analyzing results from multiple TE detection methods, as well as useful guidance for researchers studying TEs in yeast resequencing data. KEYWORDS transposable elements bioinformatics genomics yeastThe widespread availability of genomic data over the last two decades has provided unparalleled opportunities to learn about the abundance, diversity, and functional consequences of transposable elements (TEs) in modern genomes. However, the computational analysis of TE sequences in both reference and resequenced genomes remains a challenging area of bioinformatics research because of the repetitive nature of these sequences. Development of bioinformatics tools for the detection and annotation of TEs in reference genomes is now a relatively mature field (Bergman and Quesneville 2007;Saha et al. 2008;Lerat 2010), although many open questions remain about choosing the best tools for specific biological applications (Hoen et al. 2015). In contrast, detection of reference and nonreference TE insertions in whole-genome shotgun (WGS) resequencing data are an active research area (reviewed in Ewing 2015), with a large number of methods published in recent years (Sackton et al.
The settlement of Madagascar is one of the most unusual, and least understood, episodes in human prehistory. Madagascar was one of the last landmasses to be reached by people, and despite the island's location just off the east coast of Africa, evidence from genetics, language and culture all attests that it was settled jointly by Africans, and more surprisingly, Indonesians. Nevertheless, extremely little is known about the settlement process itself. Here, we report broad geographical screening of Malagasy and Indonesian genetic variation, from which we infer a statistically robust coalescent model of the island's initial settlement. Maximum-likelihood estimates favour a scenario in which Madagascar was settled approximately 1200 years ago by a very small group of women (approx. 30), most of Indonesian descent (approx. 93%). This highly restricted founding population raises the possibility that Madagascar was settled not as a large-scale planned colonization event from Indonesia, but rather through a small, perhaps even unintended, transoceanic crossing.
The Drosophila melanogaster P transposable element provides one of the best cases of horizontal transfer of a mobile DNA sequence in eukaryotes. Invasion of natural populations by the P element has led to a syndrome of phenotypes known as P-M hybrid dysgenesis that emerges when strains differing in their P element composition mate and produce offspring. Despite extensive research on many aspects of P element biology, many questions remain about the genomic basis of variation in P-M dysgenesis phenotypes across populations. Here we compare estimates of genomic P element content with gonadal dysgenesis phenotypes for isofemale strains obtained from three worldwide populations of D. melanogaster to illuminate the molecular basis of natural variation in cytotype status. We show that P element abundance estimated from genome sequences of isofemale strains is highly correlated across different bioinformatics approaches, but that abundance estimates are sensitive to method and filtering strategies as well as incomplete inbreeding of isofemale strains. We find that P element content varies significantly across populations, with strains from a North American population having fewer P elements but a higher proportion of full-length elements than strains from populations sampled in Europe or Africa. Despite these geographic differences in P element abundance and structure, neither the number of P elements nor the ratio of full-length to internally-truncated copies is strongly correlated with the degree of gonadal dysgenesis exhibited by an isofemale strain. Thus, variation in P element abundance and structure across different populations does not necessarily lead to corresponding geographic differences in gonadal dysgenesis phenotypes. Finally, we confirm that population differences in the abundance and structure of P elements that are observed from isofemale lines can also be observed in pool-seq samples from the same populations. Our work supports the view that genomic P element content alone is not sufficient to explain variation in gonadal dysgenesis across strains of D. melanogaster, and informs future efforts to decode the genomic basis of geographic and temporal differences in P element induced phenotypes.
Non-membrane-bound compartments such as P-bodies (PBs) and stress granules (SGs) play important roles in the regulation of gene expression following environmental stresses. We have systematically and quantitatively determined the protein and mRNA composition of PBs and SGs formed before and after nutrient stress. We find that high molecular weight (HMW) complexes exist prior to glucose depletion that we propose may act as seeds for further condensation of proteins forming mature PBs and SGs. We identify an enrichment of proteins with low complexity and RNA binding domains, as well as long, structured mRNAs that are poorly translated following nutrient stress. Many proteins and mRNAs are shared between PBs and SGs including several multivalent RNA binding proteins that promote condensate interactions during liquid-liquid phase separation. We uncover numerous common protein and RNA components across PBs and SGs that support a complex interaction profile during the maturation of these biological condensates. These interaction networks represent a tuneable response to stress, highlighting previously unrecognized condensate heterogeneity. These studies therefore provide an integrated and quantitative understanding of the dynamic nature of key biological condensates.
We investigated the movement behavior of participants walking within a virtual crowd in an immersive virtual environment. We investigated three different parameters that characterize a moving virtual crowd: density, speed, and direction. An immersive road-crossing scenario that took place in a virtual metropolitan city was created. In this scenario, the participants were instructed to walk toward the opposite sidewalk. Three measurements (speed, deviation, and trajectory length) were used to evaluate the impact of the parameters assigned to the virtual crowd on the movement behavior of the participants. Significant results were found for both the main and interaction effects. The results suggested that the high density, low speed, and diagonal direction situations associated with the virtual crowd had the greatest impacts on the speed, deviation, and trajectory lengths of participants when they walked in a virtual environment and were surrounded by a moving virtual population.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.