Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70% more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78% of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75% of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.
In flowering plants, the accumulation of small deletions through unequal homologous recombination (UR) and illegitimate recombination (IR) is proposed to be the major process counteracting genome expansion, which is caused primarily by the periodic amplification of long terminal repeat retrotransposons (LTR-RTs). However, the full suite of evolutionary forces that govern the gain or loss of transposable elements (TEs) and their distribution within a genome remains unclear. Here, we investigated the distribution and structural variation of LTR-RTs in relation to the rates of local genetic recombination (GR) and gene densities in the rice (Oryza sativa) genome. Our data revealed a positive correlation between GR rates and gene densities and negative correlations between LTR-RT densities and both GR and gene densities. The data also indicate a tendency for LTR-RT elements and fragments to be shorter in regions with higher GR rates; the size reduction of LTR-RTs appears to be achieved primarily through solo LTR formation by UR. Comparison of indica and japonica rice revealed patterns and frequencies of LTR-RT gain and loss within different evolutionary timeframes. Different LTR-RT families exhibited variable distribution patterns and structural changes, but overall LTR-RT compositions and genes were organized according to the GR gradients of the genome. Further investigation of non-LTR-RTs and DNA transposons revealed a negative correlation between gene densities and the abundance of DNA transposons and a weak correlation between GR rates and the abundance of long interspersed nuclear elements (LINEs)/short interspersed nuclear elements (SINEs). Together, these observations suggest that GR and gene density play important roles in shaping the dynamic structure of the rice genome.
Background: The origin and importance of exon-intron architecture comprises one of the remaining mysteries of gene evolution. Several studies have investigated the variations of intron length, GC content, ordinal position in a gene and divergence. However, there is little study about the structural variation of exons and introns.
BackgroundTransposable elements are the most abundant components of all characterized genomes of higher eukaryotes. It has been documented that these elements not only contribute to the shaping and reshaping of their host genomes, but also play significant roles in regulating gene expression, altering gene function, and creating new genes. Thus, complete identification of transposable elements in sequenced genomes and construction of comprehensive transposable element databases are essential for accurate annotation of genes and other genomic components, for investigation of potential functional interaction between transposable elements and genes, and for study of genome evolution. The recent availability of the soybean genome sequence has provided an unprecedented opportunity for discovery, and structural and functional characterization of transposable elements in this economically important legume crop.DescriptionUsing a combination of structure-based and homology-based approaches, a total of 32,552 retrotransposons (Class I) and 6,029 DNA transposons (Class II) with clear boundaries and insertion sites were structurally annotated and clearly categorized, and a soybean transposable element database, SoyTEdb, was established. These transposable elements have been anchored in and integrated with the soybean physical map and genetic map, and are browsable and visualizable at any scale along the 20 soybean chromosomes, along with predicted genes and other sequence annotations. BLAST search and other infrastracture tools were implemented to facilitate annotation of transposable elements or fragments from soybean and other related legume species. The majority (> 95%) of these elements (particularly a few hundred low-copy-number families) are first described in this study.ConclusionSoyTEdb provides resources and information related to transposable elements in the soybean genome, representing the most comprehensive and the largest manually curated transposable element database for any individual plant genome completely sequenced to date. Transposable elements previously identified in legumes, the third largest family of flowering plants, are relatively scarce. Thus this database will facilitate structural, evolutionary, functional, and epigenetic analyses of transposable elements in soybean and other legume species.
). † These two authors contributed equally to this work. SUMMARYSorghum (Sorghum bicolor) plants damaged by insects emit a blend of volatiles, predominantly sesquiterpenes, that are implicated in attracting natural enemies of the attacking insects. To characterize sesquiterpene biosynthesis in sorghum, seven terpene synthase (TPS) genes, SbTPS1 through SbTPS7, were identified based on their evolutionary relatedness to known sesquiterpene synthase genes from maize and rice. While SbTPS6 and SbTPS7 encode truncated proteins, all other TPS genes were determined to encode functional sesquiterpene synthases. Both SbTPS1 and SbTPS2 produced the major products zingiberene, b-bisabolene and b-sesquiphellandrene, but with opposite ratios of zingiberene to b-sesquiphellandrene. SbTPS3 produced (E)-a-bergamotene and (E)-b-farnesene. SbTPS4 formed (E)-b-caryophyllene as the major product. SbTPS5 produced mostly (E)-a-bergamotene and (Z)-c-bisabolene. Based on the genome sequences of sorghum, maize and rice and the sesquiterpene synthase genes they contain, collinearity analysis identified the orthologs of sorghum sesquiterpene synthase genes, except for SbTPS4, in maize and rice. Phylogenetic analysis implied that SbTPS1, SbTPS2 and SbTPS3, which exist as tandem repeats, evolved as a consequence of local gene duplication in a lineage-specific manner. Structural modeling and site-directed mutagenesis experiments revealed that three amino acids in the active site play critical roles in defining product specificity of SbTPS1, SbTPS2, SbTPS3 and their orthologs in maize and rice. The naturally occurring functional variations of sesquiterpene synthases within and between species suggest that multiple mechanisms, including lineagespecific gene duplication, subfunctionalization, neofunctionalization and pseudogenization of duplicated genes, have all played a role in the dynamic evolution of insect-induced sesquiterpene biosynthesis in grasses.
Long noncoding RNAs (lncRNAs) play a crucial role in tumorigenesis. The aim of this study is to identify lncRNA signature that can predict breast cancer patient survival. RNA expression data from 1064 patients were downloaded from The Cancer Genome Atlas project. Cox regression, Kaplan–Meier, and receiver operating characteristic (ROC) analyses were performed to construct a model for predicting the overall survival (OS) of patients and evaluate it. A model consisting of three lncRNA genes (CAT104, LINC01234, and STXBP5-AS1) was identified. The Kaplan–Meier analysis and ROC curves proved that the model could predict the prognostic survival with good sensitivity and specificity in both the validation set (AUC = 0.752, 95% confidence intervals (CI): 0.651–0.854) and the microarray dataset (AUC = 0.714, 95%CI: 0.615–0.814). Further study showed the three-lncRNA signature was not only pervasive in different breast cancer stages, subtypes and age groups, but also provides more accurate prognostic information than some widely known biomarkers. The results suggested that RNA-seq transcriptome profiling provides that the three-lncRNA signature is an independent prognostic biomarker, and have clinical significance. In addition, lncRNA, miRNA, and mRNA interaction network indicated lncRNAs may intervene in breast cancer pathogenesis by binding to miR-190b, acting as competing endogenous RNAs.
Spontaneous mutations are not randomly distributed throughout a genome. Although mutation hotspots are found on genomes of a variety of species, mechanisms that generate the hotspots are not well understood. In eukaryotes, strong association between a regional nucleotide substitution rate and insertions/deletions (indels) was reported in a previous study, and the "indel-induced mutation" hypothesis was proposed. However, it is unknown whether the association exists even in prokaryote genomes. In this study, we conducted a systematic survey for the association in 262 complete genomes from 73 bacterial species. In these bacteria, the level of nucleotide diversity was negatively correlated with the distance from the closest indel, which is consistent with the eukaryote data. The same pattern was observed even after excluding noncoding sequences, indicating that the difference in functional constraints among genomic regions is not a primary cause of the correlation. In addition, the increase of nucleotide substitution rate was detected disproportionally on a lineage carrying a derived indel mutation, confirming the indel-nucleotide diversity association in the bacterial genomes. In some cases, the level of nucleotide diversity was more than 100 times higher in regions close to indels than in distant regions. Although further understanding of the molecular mechanism is required to test the hypothesis, these results suggest that the same mechanism for the indel-nucleotide diversity associations might exist in eukaryotes and prokaryotes and play an important role in molecular evolution.
Breast cancer is a common and threatening malignant disease with multiple biological and clinical subtypes. It can be categorized into subtypes of luminal A, luminal B, Her2 positive, and basal-like. Copy number variants (CNVs) have been reported to be a potential and even better biomarker for cancer diagnosis than mRNA biomarkers, because it is considerably more stable and robust than gene expression. Thus, it is meaningful to detect CNVs of different cancers. To identify the CNV biomarker for breast cancer subtypes, we integrated the CNV data of more than 2000 samples from two large breast cancer databases, METABRIC and The Cancer Genome Atlas (TCGA). A Monte Carlo feature selection-based and incremental feature selection-based computational method was proposed and tested to identify the distinctive core CNVs in different breast cancer subtypes. We identified the CNV genes that may contribute to breast cancer tumorigenesis as well as built a set of quantitative distinctive rules for recognition of the breast cancer subtypes. The tenfold cross-validation Matthew's correlation coefficient (MCC) on METABRIC training set and the independent test on TCGA dataset were 0.515 and 0.492, respectively. The CNVs of PGAP3, GRB7, MIR4728, PNMT, STARD3, TCAP and ERBB2 were important for the accurate diagnosis of breast cancer subtypes. The findings reported in this study may further uncover the difference between different breast cancer subtypes and improve the diagnosis accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.