Repbase Update (RU) is a database of representative repeat sequences in eukaryotic genomes. Since its first development as a database of human repetitive sequences in 1992, RU has been serving as a well-curated reference database fundamental for almost all eukaryotic genome sequence analyses. Here, we introduce recent updates of RU, focusing on technical issues concerning the submission and updating of Repbase entries and will give short examples of using RU data. RU sincerely invites a broader submission of repeat sequences from the research community.Electronic supplementary materialThe online version of this article (doi:10.1186/s13100-015-0041-9) contains supplementary material, which is available to authorized users.
Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database of families of repetitive DNA elements, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). The initial release of Dfam, featured in the 2013 NAR Database Issue, contained 1143 families of repetitive elements found in humans, and was used to produce more than 100 Mb of additional annotation of TE-derived regions in the human genome, with improved speed. Here, we describe recent advances, most notably expansion to 4150 total families including a comprehensive set of known repeat families from four new organisms (mouse, zebrafish, fly and nematode). We describe improvements to coverage, and to our methods for identifying and reducing false annotation. We also describe updates to the website interface. The Dfam website has moved to http://dfam.org. Seed alignments, profile HMMs, hit lists and other underlying data are available for download.
The complete Spodoptera litura multicapsid nucleopolyhedrovirus (SpltMNPV) genome contained 139,342 bp with a G+C content of 42.7%, and 141 putative open reading frames (ORFs) or genes of 150 nucleotides or greater that showed minimal overlap. Ninety-six ORFs had homologues in Autographa californica multicapsid nucleopolyhedrovirus (AcMNPV), 16 had homologues in other baculoviruses, and 29 were unique to SpltMNPV. The homologues of ubiquitin and gp37 are fused in SpltMNPV. The genome lacked a homologue of the major budded virus glycoprotein gene gp64, but it contained a homologue of ORF130 of Lymantria dispar multicapsid nucleopolyhedrovirus (LdMNPV). There were two homologues of AcMNPV ORF2 (bro gene), and a DnaJ protein gene (SpltORF39) in which the N-terminus showed homologies with the J domain of DnaJ family proteins. Seventeen homologous regions (hrs) were identified, each containing 2-29 palindromic repeats, with an average length of 534 bp and base content (G+C%) of 33.0.
BackgroundEukaryotic genomes harbor diverse families of repetitive DNA derived from transposable elements (TEs) that are able to replicate and insert into genomic DNA. The biological role of TEs remains unclear, although they have profound mutagenic impact on eukaryotic genomes and the origin of repetitive families often correlates with speciation events. We present a new hypothesis to explain the observed correlations based on classical concepts of population genetics.Presentation of the hypothesisThe main thesis presented in this paper is that the TE-derived repetitive families originate primarily by genetic drift in small populations derived mostly by subdivisions of large populations into subpopulations. We outline the potential impact of the emerging repetitive families on genetic diversification of different subpopulations, and discuss implications of such diversification for the origin of new species.Testing the hypothesisSeveral testable predictions of the hypothesis are examined. First, we focus on the prediction that the number of diverse families of TEs fixed in a representative genome of a particular species positively correlates with the cumulative number of subpopulations (demes) in the historical metapopulation from which the species has emerged. Furthermore, we present evidence indicating that human AluYa5 and AluYb8 families might have originated in separate proto-human subpopulations. We also revisit prior evidence linking the origin of repetitive families to mammalian phylogeny and present additional evidence linking repetitive families to speciation based on mammalian taxonomy. Finally, we discuss evidence that mammalian orders represented by the largest numbers of species may be subject to relatively recent population subdivisions and speciation events.Implications of the hypothesisThe hypothesis implies that subdivision of a population into small subpopulations is the major step in the origin of new families of TEs as well as of new species. The origin of new subpopulations is likely to be driven by the availability of new biological niches, consistent with the hypothesis of punctuated equilibria. The hypothesis also has implications for the ongoing debate on the role of genetic drift in genome evolution.ReviewersThis article was reviewed by Eugene Koonin, Juergen Brosius and I. King Jordan.
BackgroundBacterial insertion sequences (IS) of IS200/IS605 and IS607 family often encode a transposase (TnpA) and a protein of unknown function, TnpB.ResultsHere we report two groups of TnpB-like proteins (Fanzor1 and Fanzor2) that are widespread in diverse eukaryotic transposable elements (TEs), and in large double-stranded DNA (dsDNA) viruses infecting eukaryotes. Fanzor and TnpB proteins share the same conserved amino acid motif in their C-terminal half regions: D-X(125, 275)-[TS]-[TS]-X-X-[C4 zinc finger]-X(5,50)-RD, but are highly variable in their N-terminal regions. Fanzor1 proteins are frequently captured by DNA transposons from different superfamilies including Helitron, Mariner, IS4-like, Sola and MuDr. In contrast, Fanzor2 proteins appear only in some IS607-type elements. We also analyze a new Helitron2 group from the Helitron superfamily, which contains elements with hairpin structures on both ends. Non-autonomous Helitron2 elements (CRe-1, 2, 3) in the genome of green alga Chlamydomonas reinhardtii are flanked by target site duplications (TSDs) of variable length (approximately 7 to 19 bp).ConclusionsThe phylogeny and distribution of the TnpB/Fanzor proteins indicate that they may be disseminated among eukaryotic species by viruses. We hypothesize that TnpB/Fanzor proteins may act as methyltransferases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.