Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70% more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78% of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75% of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.
Ab initio protein folding is one of the major unsolved problems in computational biology due to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1–20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 non-homologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score (TM-score) >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in 1/3 cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction (CASP9) experiment, QUARK server outperformed the second and third best servers by 18% and 47% based on the cumulative Z-score of global distance test-total (GDT-TS) scores in the free modeling (FM) category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress towards the solution of the most important problem in the field.
Most protein structural prediction algorithms assemble structures as reduced models that represent amino acids by a reduced number of atoms to speed up the conformational search. Building accurate full-atom models from these reduced models is a necessary step toward a detailed function analysis. However, it is difficult to ensure that the atomic models retain the desired global topology while maintaining a sound local atomic geometry because the reduced models often have unphysical local distortions. To address this issue, we developed a new program, called ModRefiner, to construct and refine protein structures from Cα traces based on a two-step, atomic-level energy minimization. The main-chain structures are first constructed from initial Cα traces and the side-chain rotamers are then refined together with the backbone atoms with the use of a composite physics- and knowledge-based force field. We tested the method by performing an atomic structure refinement of 261 proteins with the initial models constructed from both ab initio and template-based structure assemblies. Compared with other state-of-art programs, ModRefiner shows improvements in both global and local structures, which have more accurate side-chain positions, better hydrogen-bonding networks, and fewer atomic overlaps. ModRefiner is freely available at http://zhanglab.ccmb.med.umich.edu/ModRefiner.
SUMMARYSoybean (Glycine max L.) is a major crop providing an important source of protein and oil, which can also be converted into biodiesel. A major milestone in soybean research was the recent sequencing of its genome. The sequence predicts 69 145 putative soybean genes, with 46 430 predicted with high confidence. In order to examine the expression of these genes, we utilized the Illumina Solexa platform to sequence cDNA derived from 14 conditions (tissues). The result is a searchable soybean gene expression atlas accessible through a browser (http://digbio.missouri.edu/soybean_atlas). The data provide experimental support for the transcription of 55 616 annotated genes and also demonstrate that 13 529 annotated soybean genes are putative pseudogenes, and 1736 currently unannotated sequences are transcribed. An analysis of this atlas reveals strong differences in gene expression patterns between different tissues, especially between root and aerial organs, but also reveals similarities between gene expression in other tissues, such as flower and leaf organs. In order to demonstrate the full utility of the atlas, we investigated the expression patterns of genes implicated in nodulation, and also transcription factors, using both the Solexa sequence data and large-scale qRT-PCR. The availability of the soybean gene expression atlas allowed a comparison with gene expression documented in the two model legume species, Medicago truncatula and Lotus japonicus, as well as data available for Arabidopsis thaliana, facilitating both basic and applied aspects of soybean research.
Deinococcus radiodurans R1 (DEIRA) is a bacterium best known for its extreme resistance to the lethal effects of ionizing radiation, but the molecular mechanisms underlying this phenotype remain poorly understood. To define the repertoire of DEIRA genes responding to acute irradiation (15 kGy), transcriptome dynamics were examined in cells representing early, middle, and late phases of recovery by using DNA microarrays covering Ϸ94% of its predicted genes. At least at one time point during DEIRA recovery, 832 genes (28% of the genome) were induced and 451 genes (15%) were repressed 2-fold or more. The expression patterns of the majority of the induced genes resemble the previously characterized expression profile of recA after irradiation. DEIRA recA, which is central to genomic restoration after irradiation, is substantially up-regulated on DNA damage (early phase) and down-regulated before the onset of exponential growth (late phase). Many other genes were expressed later in recovery, displaying a growth-related pattern of induction. Genes induced in the early phase of recovery included those involved in DNA replication, repair, and recombination, cell wall metabolism, cellular transport, and many encoding uncharacterized proteins. Collectively, the microarray data suggest that DEIRA cells efficiently coordinate their recovery by a complex network, within which both DNA repair and metabolic functions play critical roles. Components of this network include a predicted distinct ATP-dependent DNA ligase and metabolic pathway switching that could prevent additional genomic damage elicited by metabolism-induced free radicals.T he Gram-positive aerobic bacterium Deinococcus radiodurans R1 (DEIRA) has an extraordinary resistance to ␥-radiation and a wide range of other DNA-damaging conditions, including desiccation and oxidizing agents (1, 2). Ionizing radiation induces DNA double-stranded breaks (DSBs) that are the most lethal form of DNA damage (3). After acute exposures to 10 kGy, early stationary phase (ESP) DEIRA can reassemble its 3.285-Mbp genome, which consists of four haploid genomic copies per cell (4), from hundreds of DNA DSB fragments without lethality or induced mutagenesis (5, 6). Also remarkable is DEIRA's ability to grow at 60 Gy͞h without any discernable effect on its growth rate (7). Because most organisms, generally, can tolerate so few DSBs (8), radiationinduced DSBs and their repair have been difficult to study. In DEIRA, however, there are so many DSBs in fully viable irradiated cells after high-dose irradiation that the steps in DSB repair can be monitored directly in mass culture (5, 9-11). This characteristic has been exploited and used to examine the timing of DNA recombination (5, 10, 12) after high-dose irradiation and has revealed the sequential action of RecA-independent and -dependent pathways during repair (11).Comparative genomic and experimental analyses support the view that DEIRA's extreme radiation resistance phenotype is complex, likely determined collectively by an assortment o...
The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2008/9/S1/S2Genome Biology 2008, 9:S2 http://genomebiology.com/2008/9/S1/S2 Genome Biology 2008, Volume 9, Suppl 1, Article S2 Peña-Castillo et al. S2.2 AbstractBackground: Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated.
We have sequenced five distinct mitochondrial genomes in maize: two fertile cytotypes (NA and the previously reported NB) and three cytoplasmic-male-sterile cytotypes (CMS-C, CMS-S, and CMS-T). Their genome sizes range from 535,825 bp in CMS-T to 739,719 bp in CMS-C. Large duplications (0.5-120 kb) account for most of the size increases. Plastid DNA accounts for 2.3-4.6% of each mitochondrial genome. The genomes share a minimum set of 51 genes for 33 conserved proteins, three ribosomal RNAs, and 15 transfer RNAs. Numbers of duplicate genes and plastid-derived tRNAs vary among cytotypes. A high level of sequence conservation exists both within and outside of genes (1.65-7.04 substitutions/10 kb in pairwise comparisons). However, sequence losses and gains are common: integrated plastid and plasmid sequences, as well as noncoding ''native'' mitochondrial sequences, can be lost with no phenotypic consequence. The organization of the different maize mitochondrial genomes varies dramatically; even between the two fertile cytotypes, there are 16 rearrangements. Comparing the finished shotgun sequences of multiple mitochondrial genomes from the same species suggests which genes and open reading frames are potentially functional, including which chimeric ORFs are candidate genes for cytoplasmic male sterility. This method identified the known CMS-associated ORFs in CMS-S and CMS-T, but not in CMS-C.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.