Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO)

Dwight, Selina S.; Harris, Midori A.; Dolinski, Kara; Ball, Catherine A.; Binkley, Gail; Christie, Karen R.; Fisk, Dianna G.; Issel‐Tarver, Laurie; Schroeder, Mark; Sherlock, Gavin; Sethuraman, Anand; Weng, Shuai; Botstein, David; Cherry, J. Michael

doi:10.1093/nar/30.1.69

Cited by 330 publications

(243 citation statements)

References 12 publications

Supporting

Mentioning

241

Contrasting

Unclassified

Order By: Relevance

“…The genelets ͗␥ 14 ͉, ͗␥ 15 ͉, and ͗␥ 16 ͉, which are also almost equally significant in both data sets (slightly more in the human data), with Ϫ ͞6 Ͻ 14 , 15 , 16 Ͻ 0, fit normalized cosines of two and a half periods and initial phases of Ϫ ͞3, ͞3, and 0, respectively. Coherent themes of yeast and human cellcycle programs emerge from the annotations of the 100 yeast and 100 human genes (13,14), with largest parallel and separately also antiparallel contributions from each one of these six genelets as listed in the corresponding yeast and human arraylets (see Data Sets 9 and 10, which are published as supporting information on the PNAS web site). We associate all these six genelets with the cell-cycle gene-expression oscillations common to both the yeast and human genomes and manifested in both data sets.…”

Section: Mathematical Methods: Gsvdmentioning

confidence: 99%

Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms

Alter

Brown

Botstein

2003

Proc. Natl. Acad. Sci. U.S.A.

253

269

View full text Add to dashboard Cite

We describe a comparative mathematical framework for two genome-scale expression data sets. This framework formulates expression as superposition of the effects of regulatory programs, biological processes, and experimental artifacts common to both data sets, as well as those that are exclusive to one data set or the other, by using generalized singular value decomposition. This framework enables comparative reconstruction and classification of the genes and arrays of both data sets. We illustrate this framework with a comparison of yeast and human cell-cycle expression data sets.DNA microarrays ͉ cell cycle ͉ yeast Saccharomyces cerevisiae ͉ human HeLa cell line R ecent advances in high-throughput genomic technologies enable acquisition of different types of molecular biological data, e.g., DNA-sequence and mRNA-expression data, on a genomic scale. Comparative analysis of these data among two or more model organisms promises to enhance fundamental understanding of the universality as well as the specialization of molecular biological mechanisms. It also may prove useful in medical diagnosis, treatment, and drug design. Comparisons of the DNA sequence of entire genomes already give insights into evolutionary, biochemical, and genetic pathways.Comparative analysis of mRNA-expression data requires mathematical tools that are able to distinguish the similar from the dissimilar among two or more large-scale data sets. These tools should provide mathematical frameworks for the description of the data, where the variables and operations may represent some biological reality. Recently we showed that singular value decomposition (SVD) provides such a framework for genome-wide expression data (refs. 1-3; see also refs. 4-7). Now we show that generalized SVD (GSVD) (8) provides a comparative mathematical framework for two genome-scale expression data sets. GSVD is a linear transformation of the two data sets from the two genes ϫ arrays spaces to two reduced and diagonalized ''genelets'' ϫ ''arraylets'' spaces. The genelets are shared by both data sets. Each genelet is expressed only in the two corresponding arraylets, with a corresponding ''angular distance'' indicating the relative significance of this genelet, i.e., its significance, in one data set relative to that in the other.We show that a genelet of equal significance in both data sets may represent a process common to both data sets. The two corresponding arraylets may represent the cellular states in each data set that correspond to this common process. A genelet of no significance in one data set relative to the other may represent a process exclusive to the latter data set. The corresponding arraylet of this data set may represent the cellular state that corresponds to this exclusive process.We also show that mathematical reconstruction of gene expression in a subset of genelets may simulate experimental observation of only the process that these genelets are inferred to represent. Similarly, reconstruction of array expression in the subset of corresponding arr...

show abstract

Section: Mathematical Methods: Gsvdmentioning

confidence: 99%

Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms

Alter

Brown

Botstein

2003

Proc. Natl. Acad. Sci. U.S.A.

253

269

View full text Add to dashboard Cite

show abstract

“…The yeast data set was generated by retrieving all proteins annotated with subcellular localization information from the Saccharomyces Genome Database (SGD; Dwight et al 2002;www. yeastgenome.org/).…”

Section: Methods Data Setsmentioning

confidence: 99%

Predicting Subcellular Localization via Protein Motif Co-Occurrence

Scott¹,

Thomas²,

Hallett³

2004

Genome Res.

View full text Add to dashboard Cite

The prediction of subcellular localization of proteins from their primary sequence is a challenging problem in bioinformatics. We have created a Bayesian network localization predictor called PSLT that is based on the combinatorial presence of InterPro motifs and specific membrane domains in human proteins. This probabilistic framework generates a likelihood of localization to all organelles and allows to predict multicompartmental proteins. When used to predict on nine compartments, PSLT achieves an accuracy of 78% as estimated by using a 10-fold cross-validation test and a coverage of 74%. When used to predict the localization of proteins from other closely related species, it achieves a prediction accuracy and a coverage >80%. We compared the localization predictions of PSLT to those determined through GFP-tagging and microscopy for a group of human proteins. We found two general classes of proteins that are mislocalized by the GFP-tagging strategy but are correctly localized by PSLT. This suggests that PSLT can be used in combination with experimental approaches for localization to identify proteins for which additional experimental validation is required. We used our predictor to annotate all 9793 human proteins from SWISS-PROT release 41.25, 16% of which are predicted by PSLT to be present in more than one compartment.

show abstract

“…Genes were annotated by using the biological process ontology of Gene Ontology (GO) (7) provided by the Saccharomyces Genome Database (SGD) (8). To verify that genes on the same SP are likely to be involved in the same biological process, we applied our method to the Rosetta dataset and checked the results against GO-annotated biological processes in the three major cellular compartments: mitochondria, cytoplasm, and nucleus.…”

mentioning

confidence: 99%

Transitive functional annotation by shortest-path analysis of gene expression data

Zhou

Kao

Wong

2002

Proc. Natl. Acad. Sci. U.S.A.

285

190

View full text Add to dashboard Cite

Current methods for the functional analysis of microarray gene expression data make the implicit assumption that genes with similar expression profiles have similar functions in cells. However, among genes involved in the same biological pathway, not all gene pairs show high expression similarity. Here, we propose that transitive expression similarity among genes can be used as an important attribute to link genes of the same biological pathway. Based on large-scale yeast microarray expression data, we use the shortestpath analysis to identify transitive genes between two given genes from the same biological process. We find that not only functionally related genes with correlated expression profiles are identified but also those without. In the latter case, we compare our method to hierarchical clustering, and show that our method can reveal functional relationships among genes in a more precise manner. Finally, we show that our method can be used to reliably predict the function of unknown genes from known genes lying on the same shortest path. We assigned functions for 146 yeast genes that are considered as unknown by the Saccharomyces Genome Database and by the Yeast Proteome Database. These genes constitute around 5% of the unknown yeast ORFome.

show abstract

Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO)

Cited by 330 publications

References 12 publications

Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms

Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms

Predicting Subcellular Localization via Protein Motif Co-Occurrence

Transitive functional annotation by shortest-path analysis of gene expression data

Contact Info

Product

Resources

About