We report here genome sequences and comparative analyses of three closely related parasitoid wasps: Nasonia vitripennis, N. giraulti, and N. longicornis. Parasitoids are important regulators of arthropod populations, including major agricultural pests and disease vectors, and Nasonia is an emerging genetic model, particularly for evolutionary and developmental genetics. Key findings include the identification of a functional DNA methylation tool kit; hymenopteran-specific genes including diverse venoms; lateral gene transfers among Pox viruses, Wolbachia, and Nasonia; and the rapid evolution of genes involved in nuclear-mitochondrial interactions that are implicated in speciation. Newly developed genome resources advance Nasonia for genetic research, accelerate mapping and cloning of quantitative trait loci, and will ultimately provide tools and knowledge for further increasing the utility of parasitoids as pest insect-control agents.
As is the case for some other RNA viruses, the amino acid sequences of retroviral proteins change at an astonishing rate. For example, the proteases of the human immunodeficiency virus (HIV) and the visna lentivirus with which it is often compared are as different as the proteases of fungi and mammals, and those of the human type I leukemia virus are as different from HIV or visna as are the proteins of humans and bacteria. That the sequences of retrovirus proteins can be recognized as sharing common ancestry with non-retroviral proteins implies that the vastly accelerated change has begun only recently or occurs very sporadically. Only a scheme whereby exogenous retroviruses exist as short-lived bursts upon a backdrop of germline-encoded endogenous viruses is consistent with the sequence data. Retroviruses are related to many other reverse transcriptase-bearing entities present in the genomes of eukaryotes. They also have proteins that are homologous with those of some plant and animal DNA viruses, and their reverse transcriptase is recognizably similar to sequences found in the introns of some fungal mitochondria. Computer alignment of all these sequences allows an overall phylogeny to be constructed that chronicles the history of events leading to infectious retroviruses.
A computer analysis of the amino acid sequences from the putative gene products of retroviral pol genes has revealed a 150-residue segment that is homologous with the ribonuclease H of Escherichia coli. The segment occurs at the carboxyl terminus of the region assigned to the 90-kDa reverse transcriptase polypeptide. In contrast, a section nearer the amino terminus of this sequence can be aligned with nonretroviral polymerases. (5), and an endonuclease ("integrase") that is essential for the integration of the newly synthesized DNA into the host genome (6).The pol gene of retroviruses is expressed initially as a gag-pol precursor that is proteolytically processed to a number of small gag proteins, an approximately 90-kDa protein encompassing both RNA-directed DNA polymerase (reverse transcriptase) and ribonuclease H activities, and, finally, a 40-kDa fragment with endonuclease activity (7). Several reports have presented evidence that the ribonuclease H activity of the 90-kDa reverse transcriptase portion is associated with the amino-terminal end of that protein, and by implication, that the DNA polymerase activity is at the carboxyl-terminal end. These conclusions are based on experiments involving deletion mutants (2), on the one hand, and antibodies to synthetic peptides modeled on the putative sequences, on the other (3).We now suggest that the opposite must be true: the ribonuclease H activity should be situated at the carboxyl terminus, and the DNA polymerase, at the amino terminus. We draw this conclusion on the basis of comparisons of the retroviral sequences with those of nonviral enzymes of similar function. In this regard, we have uncovered a significant resemblance between a 150-residue segment at the carboxyl-terminal end of the 90-kDa fragment and the reported sequence of a ribonuclease H from Escherichia coli. We also provide an alignment of a segment near the amino terminus of the 90-kDa polypeptide with highly conserved sequences from many other polymerases, including the a subunit of E. coli DNA-directed RNA polymerase. Finally, there is a distinctive sequence in the endonuclease sequence that is characteristic of a zinc-binding segment. METHODSThe sequences used were taken from the 1985 version of NEWAT (8) identical (Fig. 1). Binary comparison of each of the retroviral sequences with the E. coliribonuclease H sequence, followed by statistical evaluation by a randomization method, gave authentic alignment scores from 4 to 10 standard deviations above the means of the jumbled comparisons. The cumulative weight of the multiple alignment (Fig. 2) further bears out the significance of the overall relationship. That the polymerase portion of the viral reverse transcriptase system must encompass the amino-terminal portion of the 90-kDa fragment is established by the alignment shown in Fig. 3. The key region here involves a sector previously shown by Kamer and Argos (20) to be present in a number of nonretroviral polymerases; these consistently have two aspartic acid residues surrounded by...
BackgroundLINE-1 (L1) is the dominant category of transposable elements in placental mammals. L1 has significantly affected the size and structure of all mammalian genomes and understanding the nature of the interactions between L1 and its mammalian host remains a question of crucial importance in comparative genomics. For this reason, much attention has been dedicated to the evolution of L1. Among the most studied elements is the mouse L1 which has been the subject of a number of studies in the 1980s and 1990s. These seminal studies, performed in the pre-genomic era when only a limited number of L1 sequences were available, have significantly improved our understanding of L1 evolution. Yet, no comprehensive study on the evolution of L1 in mouse has been performed since the completion of this genome sequence.ResultsUsing the Genome Parsing Suite we performed the first evolutionary analysis of mouse L1 over the entire length of the element. This analysis indicates that the mouse L1 has recruited novel 5’UTR sequences more frequently than previously thought and that the simultaneous activity of non-homologous promoters seems to be one of the conditions for the co-existence of multiple L1 families or lineages. In addition the exchange of genetic information between L1 families is not limited to the 5’UTR as evidence of inter-family recombination was observed in ORF1, ORF2, and the 3’UTR. In contrast to the human L1, there was little evidence of rapid amino-acid replacement in the coiled-coil of ORF1, although this region is structurally unstable. We propose that the structural instability of the coiled-coil domain might be adaptive and that structural changes in this region are selectively equivalent to the rapid evolution at the amino-acid level reported in the human lineage.ConclusionsThe pattern of evolution of L1 in mouse shows some similarity with human suggesting that the nature of the interactions between L1 and its host might be similar in these two species. Yet, some notable differences, particularly in the evolution of ORF1, suggest that the molecular mechanisms involved in host-L1 interactions might be different in these two species.
Hidden Markov model (HMM) techniques are used to model families of biological sequences. A smooth and convergent algorithm is introduced to iteratively adapt the transition and emission parameters of the models from the examples in a given family. The HMM approach is applied to three protein families: globins, immunoglobulins, and kinases. In all cases, the models derived capture the important statistical characteristics of the family and can be used for a number of tasks, including multiple alignments, motif detection, and classification. For K sequences of average length N, this approach yields an effective multiple-alignment algorithm which requires O(KN2) operations, linear in the number of sequences.Comparative analysis of primary sequence information is a major tool in the elucidation of the molecular mechanisms of replication and evolution of organisms and the structure and function of proteins. For the simple case of pairwise sequence comparison, good algorithms exist (see refs. 1 and 2 for recent reviews) that can align two sequences of length N in roughly O(N2) steps. Most of these algorithms are based on dynamic programming (3), with location-independent substitution and gap penalties. Unfortunately, when dynamic programming is applied to a family of K sequences its behavior scales like O(NK), exponentially in the number of sequences (4).A number of algorithms have been devised to try to tackle the multiple alignment problem (see refs. 5-7 for some of the most recent ones). Most protein sequence relationships exhibiting >50%o identical residues can be aligned by several of these algorithms. Many of the most interesting protein families, however, exhibit conservation far below 50%o identity.To date, alignment methods have not been developed that can correctly identify all the motifs that define each protein family (2).Here, we apply a different approach, based on hidden Markov models (HMMs), to the problem of modeling and aligning a family by using primary structure information only. Initial results were presented (8). Markov models and the related expectation-maximization (EM) (9) algorithm in statistics have already been applied to biocomputational problems (10-13). Krogh et al. (14) were the first to demonstrate the power of a similar method on the globin family. Rather than starting from pairwise alignments, the approach seeks to take advantage of the massive amount of information typically present in a family with a flexible use of positiondependent parameters. A new algorithm is introduced for the iterative adjustments of the parameters of the models. The algorithm is used here to model three protein families:globins, immunoglobulins, and kinases.tt HMMs and Learning A first-order discrete HMM (15) is completely defined by a set of states S, an alphabet of m symbols, a probability transition matrix T = (tv), and a probability emission matrix E = (eta). When the system is in state i, it has a probability t(/ of moving to state] and a probability eia of emitting symbol a. Only the output s...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.