We report the draft genome of the black cottonwood tree, Populus trichocarpa . Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis , ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.
Uptake and translocation of cationic nutrients play essential roles in physiological processes including plant growth, nutrition, signal transduction, and development. Approximately 5% of the Arabidopsis genome appears to encode membrane transport proteins. These proteins are classified in 46 unique families containing approximately 880 members. In addition, several hundred putative transporters have not yet been assigned to families. In this paper, we have analyzed the phylogenetic relationships of over 150 cation transport proteins. This analysis has focused on cation transporter gene families for which initial characterizations have been achieved for individual members, including potassium transporters and channels, sodium transporters, calcium antiporters, cyclic nucleotide-gated channels, cation diffusion facilitator proteins, natural resistance-associated macrophage proteins (NRAMP), and Zn-regulated transporter Fe-regulated transporterlike proteins. Phylogenetic trees of each family define the evolutionary relationships of the members to each other. These families contain numerous members, indicating diverse functions in vivo. Closely related isoforms and separate subfamilies exist within many of these gene families, indicating possible redundancies and specialized functions. To facilitate their further study, the PlantsT database (http://plantst.sdsc.edu) has been created that includes alignments of the analyzed cation transporters and their chromosomal locations.
In sequence analysis, two or more (approximately) independent measures of the membership of a sequence (or sequence region) in some class are often available. We would like to estimate the likelihood of the sequence being a member of the class in view of all the available evidence. An example is estimating the significance of the observed match of a macromolecular sequence (DNA or protein) to a set of patterns (motifs) that characterize a biological sequence family. An intuitive way to do this is to express each piece of evidence as a p-value, and then use the product of these p-values as the measure of membership in the family. We derive a formula and algorithm (QFAST) for calculating the statistical distribution of the product of n independent p-values. We demonstrate that sorting sequences by this p-value effectively combines the information present in multiple motifs, leading to highly accurate and sensitive sequence homology searches.
The CDPK-SnRK superfamily consists of seven types of serine-threonine protein kinases: calcium-dependent protein kinase (CDPKs), CDPK-related kinases (CRKs), phosphoenolpyruvate carboxylase kinases (PPCKs), PEP carboxylase kinase-related kinases (PEPRKs), calmodulin-dependent protein kinases (CaMKs), calcium and calmodulin-dependent protein kinases (CCaMKs), and SnRKs. Within this superfamily, individual isoforms and subfamilies contain distinct regulatory domains, subcellular targeting information, and substrate specificities. Our analysis of the Arabidopsis genome identified 34 CDPKs, eight CRKs, two PPCKs, two PEPRKs, and 38 SnRKs. No definitive examples were found for a CCaMK similar to those previously identified in lily (Lilium longiflorum) and tobacco (Nicotiana tabacum) or for a CaMK similar to those in animals or yeast. CDPKs are present in plants and a specific subgroup of protists, but CRKs, PPCKs, PEPRKs, and two of the SnRK subgroups have been found only in plants. CDPKs and at least one SnRK have been implicated in decoding calcium signals in Arabidopsis. Analysis of intron placements supports the hypothesis that CDPKs, CRKs, PPCKs and PEPRKs have a common evolutionary origin; however there are no conserved intron positions between these kinases and the SnRK subgroup. CDPKs and SnRKs are found on all five Arabidopsis chromosomes. The presence of closely related kinases in regions of the genome known to have arisen by genome duplication indicates that these kinases probably arose by divergence from common ancestors. The PlantsP database provides a resource of continuously updated information on protein kinases from Arabidopsis and other plants.In eukaryotes, protein kinases are involved in regulating key aspects of cellular function, including cell division, metabolism, and responses to external signals. The completed sequence of the Arabidopsis genome provides the first opportunity to identify all of the protein kinases present in a model plant. The Arabidopsis genome encodes 1,085 typical protein kinases (M. Gribskov, unpublished data), which is about 4% of the predicted 25,500 genes (Arabidopsis Article, publication date, and citation information can be found at www.plantphysiol.org/cgi
Profile analysis is a method for detecting distantly related proteins by sequence comparison. The basis for comparison is not only the customary Dayhoff mutationaldistance matrix but also the results of structural studies and information implicit in the alignments of the sequences of families of similar proteins. This information is expressed in a position-specific scoring table (profile), which is created from a group of sequences previously aligned by structural or sequence similarity. The similarity of any other sequence (target) to the group of aligned sequences (probe) can be tested by comparing the target to the profile using dynamic programming algorithms. The profile method differs in two major respects from methods of sequence comparison in common use:
Vascular plants appeared ~410 million years ago then diverged into several lineages of which only two survive: the euphyllophytes (ferns and seed plants) and the lycophytes (1). We report here the genome sequence of the lycophyte Selaginella moellendorffii (Selaginella), the first non-seed vascular plant genome reported. By comparing gene content in evolutionary diverse taxa, we found that the transition from a gametophyte- to sporophyte-dominated life cycle required far fewer new genes than the transition from a non-seed vascular to a flowering plant, while secondary metabolic genes expanded extensively and in parallel in the lycophyte and angiosperm lineages. Selaginella differs in post-transcriptional gene regulation, including small RNA regulation of repetitive elements, an absence of the tasiRNA pathway and extensive RNA editing of organellar genes.
We performed a systematic BLAST analysis of 929 human disease gene entries associated with at least one mutant allele in the Online Mendelian Inheritance in Man (OMIM) database against the recently completed genome sequence of Drosophila melanogaster. The results of this search have been formatted as an updateable and searchable on-line database called Homophila. Our analysis identified 714 distinct human disease genes (77% of disease genes searched) matching 548 unique Drosophila sequences, which we have summarized by disease category. This breakdown into disease classes creates a picture of disease genes that are amenable to study usingDrosophila as the model organism. Of the 548Drosophila genes related to human disease genes, 153 are associated with known mutant alleles and 56 more are tagged byP-element insertions in or near the gene. Examples of how to use the database to identify Drosophila genes related to human disease genes are presented. We anticipate that cross-genomic analysis of human disease genes using the power of Drosophilasecond-site modifier screens will promote interaction between human andDrosophila research groups, accelerating the understanding of the pathogenesis of human genetic disease. The Homophila database is available at http://homophila.sdsc.edu.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.