Our analysis indicates information content-based measures outperform graph structure-based measures for stratifying protein interactions. Measures in terms of GO biological process and molecular function annotations can be used alone or together for the validation of protein interactions involved in the pathways. However, GO cellular component-derived measures may not have the ability to separate true positives from noise. Furthermore, we demonstrate that the functional similarity of proteins within known regulatory pathways decays rapidly as the path length between two proteins increases. Several logistic regression models are built to estimate the confidence of both direct and indirect interactions within a pathway, which may be used to score putative pathways inferred from a scaffold of molecular interactions.
Deciphering the human genome includes locating the promoters that initiate transcription and identifying the exons of genes. Many promoter prediction programs have been proposed, but when they are applied to extended regions of the genome, most of their predictions are false-positives. The extensive collection of gene transcript sequences is an important new source of information, which has not been used previously in promoter predictions. Our approach is to enhance the specificity of predictions by restricting the genomic regions that are searched using gene transcript alignments as anchors in the genome for gene modeling. We developed a consensus promoter prediction method combining previously developed algorithms with the GENSCAN gene modeling program. Our method, CONPRO (CONsensus PROmoter), identifies promoters with very high confidence, and the predicted promoters are guaranteed to be associated with genes. On our test data set, the method correctly detects promoters for approximately half of all human genes (37%–71%), and most predictions are true promoters (85%–90%). Applying our method to the human genome and human genes from the Unigene data set, we find the promoters for 13,744 genes. Of these, 6440 are genes with a functionally cloned mRNA, and 7304 are novel genes for which only expressed sequence tags (ESTs) are available. Candidate promoters for many novel genes will be a useful resource in elucidating complex biological response mechanisms. CONPRO is available for searching promoters in the human genome (http://stl.bioinformatics.med.umich.edu/conpro)
Identifying novel NF-κB-regulated immune genes in the human genome is important to our understanding of immune mechanisms and immune diseases. We fit logistic regression models to the promoters of 62 known NF-κB-regulated immune genes, to find patterns of transcription factor binding in the promoters of genes with known immune function. Using these patterns, we scanned the promoters of additional genes to find matches to the patterns, selected those with NF-κB binding sites conserved in the mouse or fly, and then confirmed them as NF-κB-regulated immune genes based on expression data. Among 6440 previously identified promoters in the human genome, we found 28 predicted immune gene promoters, 19 of which regulate genes with known function, allowing us to calculate specificity of 93%–100% for the method. We calculated sensitivity of 42% when searching the 62 known immune gene promoters. We found nine novel NF-κB-regulated immune genes which are consistent with available SAGE data. Our method of predicting gene function, based on characteristic patterns of transcription factor binding, evolutionary conservation, and expression studies, would be applicable to finding genes with other functions.
The programs of implementation of this algorithm are available upon request. The list of crystal structures used for compiling the mean base step parameters of DNA is available by anonymous ftp at http://stateslab.wustl.edu/pub/helix/StructureList.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.