PrDOS is a server that predicts the disordered regions of a protein from its amino acid sequence (http://prdos.hgc.jp). The server accepts a single protein amino acid sequence, in either plain text or FASTA format. The prediction system is composed of two predictors: a predictor based on local amino acid sequence information and one based on template proteins. The server combines the results of the two predictors and returns a two-state prediction (order/disorder) and a disorder probability for each residue. The prediction results are sent by e-mail, and the server also provides a web-interface to check the results.
The Tohoku Medical Megabank Organization reports the whole-genome sequences of 1,070 healthy Japanese individuals and construction of a Japanese population reference panel (1KJPN). Here we identify through this high-coverage sequencing (32.4 × on average), 21.2 million, including 12 million novel, single-nucleotide variants (SNVs) at an estimated false discovery rate of <1.0%. This detailed analysis detected signatures for purifying selection on regulatory elements as well as coding regions. We also catalogue structural variants, including 3.4 million insertions and deletions, and 25,923 genic copy-number variants. The 1KJPN was effective for imputing genotypes of the Japanese population genome wide. These data demonstrate the value of high-coverage sequencing for constructing population-specific variant panels, which covers 99.0% SNVs of minor allele frequency ≥0.1%, and its value for identifying causal rare variants of complex human disease phenotypes in genetic association studies.
ATTED-II (http://atted.jp) is a database of gene coexpression in Arabidopsis that can be used to design a wide variety of experiments, including the prioritization of genes for functional identification or for studies of regulatory relationships. Here, we report updates of ATTED-II that focus especially on functionalities for constructing gene networks with regard to the following points: (i) introducing a new measure of gene coexpression to retrieve functionally related genes more accurately, (ii) implementing clickable maps for all gene networks for step-by-step navigation, (iii) applying Google Maps API to create a single map for a large network, (iv) including information about protein–protein interactions, (v) identifying conserved patterns of coexpression and (vi) showing and connecting KEGG pathway information to identify functional modules. With these enhanced functions for gene network representation, ATTED-II can help researchers to clarify the functional and regulatory networks of genes in Arabidopsis.
Publicly available database of co-expressed gene sets would be a valuable tool for a wide variety of experimental designs, including targeting of genes for functional identification or for regulatory investigation. Here, we report the construction of an Arabidopsis thaliana trans-factor and cis-element prediction database (ATTED-II) that provides co-regulated gene relationships based on co-expressed genes deduced from microarray data and the predicted cis elements. ATTED-II () includes the following features: (i) lists and networks of co-expressed genes calculated from 58 publicly available experimental series, which are composed of 1388 GeneChip data in A.thaliana; (ii) prediction of cis-regulatory elements in the 200 bp region upstream of the transcription start site to predict co-regulated genes amongst the co-expressed genes; and (iii) visual representation of expression patterns for individual genes. ATTED-II can thus help researchers to clarify the function and regulation of particular genes and gene networks.
The Great East Japan Earthquake (GEJE) and resulting tsunami of March 11, 2011 gave rise to devastating damage on the Pacific coast of the Tohoku region. The Tohoku Medical Megabank Project (TMM), which is being conducted by Tohoku University Tohoku Medical Megabank Organization (ToMMo) and Iwate Medical University Iwate Tohoku Medical Megabank Organization (IMM), has been launched to realize creative reconstruction and to solve medical problems in the aftermath of this disaster. We started two prospective cohort studies in Miyagi and Iwate Prefectures: a population-based adult cohort study, the TMM Community-Based Cohort Study (TMM CommCohort Study), which will recruit 80 000 participants, and a birth and three-generation cohort study, the TMM Birth and Three-Generation Cohort Study (TMM BirThree Cohort Study), which will recruit 70 000 participants, including fetuses and their parents, siblings, grandparents, and extended family members. The TMM CommCohort Study will recruit participants from 2013 to 2016 and follow them for at least 5 years. The TMM BirThree Cohort Study will recruit participants from 2013 to 2017 and follow them for at least 4 years. For children, the ToMMo Child Health Study, which adopted a cross-sectional design, was also started in November 2012 in Miyagi Prefecture. An integrated biobank will be constructed based on the two prospective cohort studies, and ToMMo and IMM will investigate the chronic medical impacts of the GEJE. The integrated biobank of TMM consists of health and clinical information, biospecimens, and genome and omics data. The biobank aims to establish a firm basis for personalized healthcare and medicine, mainly for diseases aggravated by the GEJE in the two prefectures. Biospecimens and related information in the biobank will be distributed to the research community. TMM itself will also undertake genomic and omics research. The aims of the genomic studies are: 1) to construct an integrated biobank; 2) to return genomic research results to the participants of the cohort studies, which will lead to the implementation of personalized healthcare and medicine in the affected areas in the near future; and 3) to contribute the development of personalized healthcare and medicine worldwide. Through the activities of TMM, we will clarify how to approach prolonged healthcare problems in areas damaged by large-scale disasters and how useful genomic information is for disease prevention.
Information regarding gene coexpression is useful to predict gene function. Several databases have been constructed for gene coexpression in model organisms based on a large amount of publicly available gene expression data measured by GeneChip platforms. In these databases, Pearson's correlation coefficients (PCCs) of gene expression patterns are widely used as a measure of gene coexpression. Although the coexpression measure or GeneChip summarization method affects the performance of the gene coexpression database, previous studies for these calculation procedures were tested with only a small number of samples and a particular species. To evaluate the effectiveness of coexpression measures, assessments with large-scale microarray data are required. We first examined characteristics of PCC and found that the optimal PCC threshold to retrieve functionally related genes was affected by the method of gene expression database construction and the target gene function. In addition, we found that this problem could be overcome when we used correlation ranks instead of correlation values. This observation was evaluated by large-scale gene expression data for four species: Arabidopsis, human, mouse and rat.
ATTED-II (http://atted.jp) is a coexpression database for plant species to aid in the discovery of relationships of unknown genes within a species. As an advanced coexpression analysis method, multispecies comparisons have the potential to detect alterations in gene relationships within an evolutionary context. However, determining the validity of comparative coexpression studies is difficult without quantitative assessments of the quality of coexpression data. ATTED-II (version 9) provides 16 coexpression platforms for nine plant species, including seven species supported by both microarray- and RNA sequencing (RNAseq)-based coexpression data. Two independent sources of coexpression data enable the assessment of the reproducibility of coexpression. The latest coexpression data for Arabidopsis (Ath-m.c7-1 and Ath-r.c3-0) showed the highest reproducibility (Jaccard coefficient = 0.13) among previous coexpression data in ATTED-II. We also investigated the statistical basis of the mutual rank (MR) index as a coexpression measure by bootstrap sampling of experimental units. We found that the error distribution of the logit-transformed MR index showed normality with equal variances for each coexpression platform. Because the MR error was strongly correlated with the number of samples for the coexpression data, typical confidence intervals for the MR index can be estimated for any coexpression platform. These new, high-quality coexpression data can be analyzed with any tool in ATTED-II and combined with external resources to obtain insight into plant biology.
The identification of protein biochemical functions based on their three-dimensional structures is strongly required in the post-genome-sequencing era. We have developed a new method to identify and predict protein biochemical functions using the similarity information of molecular surface geometries and electrostatic potentials on the surfaces. Our prediction system consists of a similarity search method based on a clique search algorithm and the molecular surface database eF-site (electrostatic surface of functional-site in proteins). Using this system, functional sites similar to those of phosphoenoylpyruvate carboxy kinase were detected in several mononucleotide-binding proteins, which have different folds. We also applied our method to a hypothetical protein, MJ0226 from Methanococcus jannaschii, and detected the mononucleotide binding site from the similarity to other proteins having different folds.Keywords: Protein function prediction; three-dimensional structure; molecular surface; electrostatic potential; clique search algorithm With the progress of genome projects, more than 60 genomic sequences have already been provided. However, large fractions of the gene products have not been annotated, and even when annotations were assumed, not all of them might be reliable due to the inherent ambiguity in functional inference using conventional methods based mainly on sequence homology ( The protein function is considered to have two different aspects: a biological function and a biochemical function.From the viewpoint of molecular biology, the former function could be described in terms of protein-protein interactions, whereas the latter should be directly related to each protein structure. Our goal in the present study was to develop a reliable method to predict the biochemical functions of proteins from their three-dimensional (3D) structures.Our approach is to search for similar substructures of known proteins against hypothetical proteins. Currently, it is well known that fold-level similarity can give us only a limited amount of information regarding the hypothetical proteins' function (Thornton et al. 2000;Todd et al. 2002). Thus, our method should be able to detect the similarities among proteins that have different folds. For that purpose, we focused our attention on the local structures of the functional sites in proteins.In particular, we describe the protein structure based on the molecular surfaces of the proteins and the electrostatic potential on the surface, in order to discriminate the physicochemical differences in the local surface.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.