Endogenous retrovirus (ERV) families are derived from their exogenous counterparts by means of a process of germ-line infection and proliferation within the host genome. Several families in the human and mouse genomes now consist of many hundreds of elements and, although several candidates have been proposed, the mechanism behind this proliferation has remained uncertain. To investigate this mechanism, we reconstructed the ratio of nonsynonymous to synonymous changes and the acquisition of stop codons during the evolution of the human ERV family HERV-K(HML2). We show that all genes, including the env gene, which is necessary only for movement between cells, have been under continuous purifying selection. This finding strongly suggests that the proliferation of this family has been almost entirely due to germ-line reinfection, rather than retrotransposition in cis or complementation in trans, and that an infectious pool of endogenous retroviruses has persisted within the primate lineage throughout the past 30 million years. Because many elements within this pool would have been unfixed, it is possible that the HERV-K(HML2) family still contains infectious elements at present, despite their apparent absence in the human genome sequence. Analysis of the env gene of eight other HERV families indicated that reinfection is likely to be the most common mechanism by which endogenous retroviruses proliferate in their hosts.
BackgroundThe relationship between DNA sequence and encoded information is still an unsolved puzzle. The number of protein-coding genes in higher eukaryotes identified by genome projects is lower than was expected, while a considerable amount of putatively non-coding transcription has been detected. Functional small open reading frames (smORFs) are known to exist in several organisms. However, coding sequence detection methods are biased against detecting such very short open reading frames. Thus, a substantial number of non-canonical coding regions encoding short peptides might await characterization.ResultsUsing bio-informatics methods, we have searched for smORFs of less than 100 amino acids in the putatively non-coding euchromatic DNA of Drosophila melanogaster, and initially identified nearly 600,000 of them. We have studied the pattern of conservation of these smORFs as coding entities between D. melanogaster and Drosophila pseudoobscura, their presence in syntenic and in transcribed regions of the genome, and their ratio of conservative versus non-conservative nucleotide changes. For negative controls, we compared the results with those obtained using random short sequences, while a positive control was provided by smORFs validated by proteomics data.ConclusionsThe combination of these analyses led us to postulate the existence of at least 401 functional smORFs in Drosophila, with the possibility that as many as 4,561 such functional smORFs may exist.
Insertion bias and purifying selection of retrotransposons in the Arabidopsis thaliana genome Genome evolution and size variation in multicellular organisms are profoundly influenced by the activity of retrotransposons. In higher eukaryotes with compact genomes retrotransposons are found in lower copy numbers than in larger genomes, which could be due to either suppression of transposition or to elimination of insertions, and are non-randomly distributed along the chromosomes. The evolutionary mechanisms constraining retrotransposon copy number and chromosomal distribution are still poorly understood.
The correlation coefficient is commonly used as a measure of the divergence of gene expression profiles between different species. Here we point out a potential problem with this statistic: if measurement error is large relative to the differences in expression, the correlation coefficient will tend to show high divergence for genes that have relatively uniform levels of expression across tissues or time points. We show that genes with a conserved uniform pattern of expression have significantly higher levels of expression divergence, when measured using the correlation coefficient, than other genes, in a data set from mouse, rat, and human. We also show that the Euclidean distance yields low estimates of expression divergence for genes with a conserved uniform pattern of expression.
Endogenous retroviruses (ERVs) result from germ line infections by exogenous retroviruses. They can proliferate within the genome of their host species until they are either inactivated by mutation or removed by recombinational deletion. ERVs belong to a diverse group of mobile genetic elements collectively termed transposable elements (TEs). Numerous studies have attempted to elucidate the factors determining the genomic distribution and persistence of TEs. Here we show that, within humans, gene density and not recombination rate correlates with fixation of endogenous retroviruses, whereas the local recombination rate determines their persistence in a full-length state. Recombination does not appear to influence fixation either via the ectopic exchange model or by indirect models based on the efficacy of selection. We propose a model linking rates of meiotic recombination to the probability of recombinational deletion to explain the effect of recombination rate on persistence. Chromosomes 19 and Y are exceptions, possessing more elements than other regions, and we suggest this is due to low gene density and elevated rates of human ERV integration in males for chromosome Y and segmental duplication for chromosome 19.
Background: Dispersed repeats are a major component of eukaryotic genomes and drivers of genome evolution. Annotation of DNA sequences homologous to known repetitive elements has been mainly performed with the program REPEATMASKER. Sequences annotated by REPEATMASKER often correspond to fragments of repetitive elements resulting from the insertion of younger elements or other rearrangements. Although REPEATMASKER annotation is indispensable for studying genome biology, this annotation does not contain much information on the common origin of fossil fragments that share an insertion event, especially where clusters of nested insertions of repetitive elements have occurred.
Transposable elements (TEs) can affect the regulation of nearby genes through several mechanisms. Here, we examine to what extent recent TE insertions have contributed to the evolution of gene expression in hominids. We compare expression levels of human and chimpanzee orthologs and detect a weak increase in expression divergence (ED) for genes with species-specific TE insertions compared with unaffected genes. However, we show that genes with TE insertions predating the human-chimpanzee split also exhibit a similar increase in ED and therefore conclude that the increase is not due to the transcriptional influence of the TEs. These results are further confirmed by lineage-specific analysis of ED, using rhesus macaque as an outgroup: Human-chimpanzee ortholog pairs, where one ortholog has suffered TE insertion but not the other, do not show increased ED along the lineage where the insertion occurred, relative to the other lineage. We also show that genes with recent TE insertions tend to produce more alternative transcripts but find no evidence that the TEs themselves promote transcript diversity. Finally, we observe that TEs are enriched upstream relative to downstream of genes and show that this is due to insertional bias, rather than selection, because this bias is only observed in genes expressed in the germ line. This provides an alternative neutral explanation for the accumulation of TEs in upstream sequences.
BackgroundMany genomes contain a substantial number of transposable elements (TEs), a few of which are known to be involved in regulating gene expression. However, recent observations suggest that TEs may have played a very important role in the evolution of gene expression because many conserved non-genic sequences, some of which are know to be involved in gene regulation, resemble TEs.ResultsHere we investigate whether new TE insertions affect gene expression profiles by testing whether gene expression divergence between mouse and rat is correlated to the numbers of new transposable elements inserted near genes. We show that expression divergence is significantly correlated to the number of new LTR and SINE elements, but not to the numbers of LINEs. We also show that expression divergence is not significantly correlated to the numbers of ancestral TEs in most cases, which suggests that the correlations between expression divergence and the numbers of new TEs are causal in nature. We quantify the effect and estimate that TE insertion has accounted for ∼20% (95% confidence interval: 12% to 26%) of all expression profile divergence in rodents.ConclusionsWe conclude that TE insertions may have had a major impact on the evolution of gene expression levels in rodents.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.