BackgroundSequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations.ResultsWe benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species.ConclusionsThe benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.
One contribution of 17 to a theme issue 'Eukaryotic origins: progress and challenges'. What's in a genome? The C-value enigma and the evolution of eukaryotic genome content Tyler A. Elliott and T. Ryan Gregory Department of Integrative Biology, University of Guelph, Guelph, Ontario, Canada N1G 2W1 Some notable exceptions aside, eukaryotic genomes are distinguished from those of Bacteria and Archaea in a number of ways, including chromosome structure and number, repetitive DNA content, and the presence of introns in protein-coding regions. One of the most notable differences between eukaryotic and prokaryotic genomes is in size. Unlike their prokaryotic counterparts, eukaryotes exhibit enormous (more than 60 000-fold) variability in genome size which is not explained by differences in gene number. Genome size is known to correlate with cell size and division rate, and by extension with numerous organism-level traits such as metabolism, developmental rate or body size. Less well described are the relationships between genome size and other properties of the genome, such as gene content, transposable element content, base pair composition and related features. The rapid expansion of 'complete' genome sequencing projects has, for the first time, made it possible to examine these relationships across a wide range of eukaryotes in order to shed new light on the causes and correlates of genome size diversity. This study presents the results of phylogenetically informed comparisons of genome data for more than 500 species of eukaryotes. Several relationships are described between genome size and other genomic parameters, and some recommendations are presented for how these insights can be extended even more broadly in the future.
BackgroundThe genomes of eukaryotes vary enormously in size, with much of this diversity driven by differences in the abundances of transposable elements (TEs). There is also substantial structural and phylogenetic diversity among TEs, such that they can be classified into distinct classes, superfamilies, and families. Possible relationships between TE diversity (and not just abundance) and genome size have not been investigated to date, though there are reasons to expect either a positive or a negative correlation. This study compares data from 257 species of animals, plants, fungi, and “protists” to determine whether TE diversity at the superfamily level is related to genome size.ResultsNo simple relationship was found between TE diversity and genome size. There is no significant correlation across all eukaryotes, but there is a positive correlation for genomes below 500Mbp and a negative correlation among land plants. No relationships were found across animals or within vertebrates. Some TE superfamilies tend to be present across all major groups of eukaryotes, but there is considerable variance in TE diversity in different taxa.ConclusionsDifferences in genome size are thought to arise primarily through accumulation of TEs, but beyond a certain point (~500 Mbp), TE diversity does not increase with genome size. Several possible explanations for these complex patterns are discussed, and recommendations to facilitate future analyses are provided.Electronic supplementary materialThe online version of this article (doi:10.1186/s12862-015-0339-8) contains supplementary material, which is available to authorized users.
Biological conclusions based on DNA barcoding and metabarcoding analyses can be strongly influenced by the methods utilized for data generation and curation, leading to varying levels of success in the separation of biological variation from experimental error. The 5′ region of cytochrome c oxidase subunit I (COI-5P) is the most common barcode gene for animals, with conserved structure and function that allows for biologically informed error identification. Here, we present coil ( https://CRAN.R-project.org/package=coil ), an R package for the pre-processing and frameshift error assessment of COI-5P animal barcode and metabarcode sequence data. The package contains functions for placement of barcodes into a common reading frame, accurate translation of sequences to amino acids, and highlighting insertion and deletion errors. The analysis of 10 000 barcode sequences of varying quality demonstrated how coil can place barcode sequences in reading frame and distinguish sequences containing indel errors from error-free sequences with greater than 97.5% accuracy. Package limitations were tested through the analysis of COI-5P sequences from the plant and fungal kingdoms as well as the analysis of potential contaminants: nuclear mitochondrial pseudogenes and Wolbachia COI-5P sequences. Results demonstrated that coil is a strong technical error identification method but is not reliable for detecting all biological contaminants.
Molecular identification is increasingly used to speed up biodiversity surveys and laboratory experiments. However, many groups of organisms cannot be reliably identified using standard databases such as GenBank or BOLD due to lack of sequenced voucher specimens identified by experts. Sometimes a large number of sequences are available, but with too many errors to allow identification. Here we address this problem for parasitoids of Drosophila by introducing a curated open-access molecular reference database, DROP (Drosophila parasitoids). Identifying Drosophila parasitoids is specimens are identified by taxonomists and vetted through direct comparison with primary type material. To initiate DROP, we curated 154 laboratory strains, 853 vouchers, 545 DNA sequences, 16 genomes, 11 transcriptomes, and 6 proteomes drawn from a total of 183 operational taxonomic units (OTUs): 113 described Drosophila parasitoid species and 70 provisional species. We found species richness of Drosophila parasitoids to be acutely underestimated and provide an updated taxonomic catalogue for the community. DROP offers accurate molecular identification and improves crossreferencing between individual studies that we hope will catalyze research on this diverse and fascinating model system. Our effort should also serve as an example for researchers facing similar molecular identification problems in other groups of organisms.
A promising recent development in molecular biology involves viewing the genome as a miniecosystem, where genetic elements are compared to organisms and the surrounding cellular and genomic structures are regarded as the local environment. Here we critically evaluate the prospects of Ecological Neutral Theory (ENT), a popular model in ecology, as it applies at the genomic level. This assessment requires an overview of the controversy surrounding neutral models in community ecology. In particular, we discuss the limitations of using ENT both as an explanation of community dynamics and as a null hypothesis. We then analyze a case study in which ENT has been applied to genomic data. Our central finding is that genetic elements do not conform to the requirements of ENT once its assumptions and limitations are made explicit. We further compare this genome-level application of ENT to two other, more familiar approaches in genomics that rely on neutral mechanisms: Kimura's Molecular Neutral Theory and Lynch's Mutational Hazard Model. Interestingly, this comparison reveals that there are two distinct concepts of neutrality associated with these models which we dub 'fitness-neutrality' and 'competitive neutrality'. This distinction helps to clarify the various roles for neutral models in genomics, for example, in explaining the evolution of genome size.
Considerable variation exists not only in the kinds of transposable elements (TEs) occurring within the genomes of different species, but also in their abundance and distribution. Noting a similarity to the assortment of organisms among ecosystems, some researchers have called for an ecological approach to the study of transposon dynamics. However, there are several ways to adopt such an approach, and it is sometimes unclear what an ecological perspective will add to the existing co-evolutionary framework for explaining transposon-host interactions. This review aims to clarify the conceptual foundations of transposon ecology in order to evaluate its explanatory prospects. We begin by identifying three unanswered questions regarding the abundance and distribution of TEs that potentially call for an ecological explanation. We then offer an operational distinction between evolutionary and ecological approaches to these questions. By determining the amount of variance in transposon abundance and distribution that is explained by ecological and evolutionary factors, respectively, it is possible empirically to assess the prospects for each of these explanatory frameworks. To illustrate how this methodology applies to a concrete example, we analyzed whole-genome data for one set of distantly related mammals and another more closely related group of arthropods. Our expectation was that ecological factors are most informative for explaining differences among individual TE lineages, rather than TE families, and for explaining their distribution among closely related as opposed to distantly related host genomes. We found that, in these data sets, ecological factors do in fact explain most of the variation in TE abundance and distribution among TE lineages across less distantly related host organisms. Evolutionary factors were not significant at these levels. However, the explanatory roles of evolution and ecology become inverted at the level of TE families or among more distantly related genomes. Not only does this example demonstrate the utility of our distinction between ecological and evolutionary perspectives, it further suggests an appropriate explanatory domain for the burgeoning discipline of transposon ecology. The fact that ecological processes appear to be impacting TE lineages over relatively short time scales further raises the possibility that transposons might serve as useful model systems for testing more general hypotheses in ecology.
Background: The nuclear genomes of eukaryotes vary enormously in size, with much of this variability attributable to differential accumulation of transposable elements (TEs). To date, the precise evolutionary and ecological conditions influencing TE accumulation remain poorly understood. Most previous attempts to identify these conditions have focused on evolutionary processes occurring at the host organism level, whereas we explore a TE ecology explanation. Results: As an alternative (or additional) hypothesis, we propose that ecological mechanisms occurring within the host cell may contribute to patterns of TE accumulation. To test this idea, we conducted a series of experiments using a simulated asexual TE/host system. Each experiment tracked the accumulation rate for a given type of TE within a particular host genome. TEs in this system had a net deleterious effect on host fitness, which did not change over the course of experiments. As one might expect, in the majority of experiments TEs were either purged from the genome or drove the host population to extinction. However, in an intriguing handful of cases, TEs co-existed with hosts and accumulated to very large numbers. This tended to occur when TEs achieved a stable density relative to non-TE sequences in the genome (as opposed to reaching any particular absolute number). In our model, the only way to maintain a stable density was for TEs to generate new, inactive copies at a rate that balanced with the production of active (replicating) copies. Conclusions: From a TE ecology perspective, we suggest this could be interpreted as a case of ecosystem engineering within the genome, where TEs persist by creating their own "habitat".
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.