Annotating the functional properties of gene products, i.e., RNAs and proteins, is a fundamental task in biology. The Gene Ontology database (GO) was developed to systematically describe the functional properties of gene products across species, and to facilitate the computational prediction of gene function. As GO is routinely updated, it serves as the gold standard and main knowledge source in functional genomics. Many gene function prediction methods making use of GO have been proposed. But no literature review has summarized these methods and the possibilities for future efforts from the perspective of GO. To bridge this gap, we review the existing methods with an emphasis on recent solutions. First, we introduce the conventions of GO and the widely adopted evaluation metrics for gene function prediction. Next, we summarize current methods of gene function prediction that apply GO in different ways, such as using hierarchical or flat inter-relationships between GO terms, compressing massive GO terms and quantifying semantic similarities. Although many efforts have improved performance by harnessing GO, we conclude that there remain many largely overlooked but important topics for future research.
Gene Ontology (GO) uses structured vocabularies (or terms) to describe the molecular functions, biological roles, and cellular locations of gene products in a hierarchical ontology. GO annotations associate genes with GO terms and indicate the given gene products carrying out the biological functions described by the relevant terms. However, predicting correct GO annotations for genes from a massive set of GO terms as defined by GO is a difficult challenge. To combat with this challenge, we introduce a Gene Ontology Hierarchy Preserving Hashing (HPHash) based semantic method for gene function prediction. HPHash firstly measures the taxonomic similarity between GO terms. It then uses a hierarchy preserving hashing technique to keep the hierarchical order between GO terms, and to optimize a series of hashing functions to encode massive GO terms via compact binary codes. After that, HPHash utilizes these hashing functions to project the gene-term association matrix into a low-dimensional one and performs semantic similarity based gene function prediction in the low-dimensional space. Experimental results on three model species (Homo sapiens, Mus musculus and Rattus norvegicus) for interspecies gene function prediction show that HPHash performs better than other related approaches and it is robust to the number of hash functions. In addition, we also take HPHash as a plugin for BLAST based gene function prediction. From the experimental results, HPHash again significantly improves the prediction performance. The codes of HPHash are available at: http://mlda.swu.edu.cn/codes.php?name=HPHash.
A remaining key challenge of modern biology is annotating the functional roles of proteins. Various computational models have been proposed for this challenge. Most of them assume the annotations of annotated proteins are complete. But in fact, many of them are incomplete. We proposed a method called NewGOA to predict new Gene Ontology (GO) annotations for incompletely annotated proteins and for completely un-annotated ones. NewGOA employs a hybrid graph, composed of two types of nodes (proteins and GO terms), to encode interactions between proteins, hierarchical relationships between terms and available annotations of proteins. To account for structural difference between GO terms subgraph and proteins subgraph, NewGOA applies a bi-random walks algorithm, which executes asynchronous random walks on the hybrid graph, to predict new GO annotations of proteins. Experimental study on archived GO annotations of two model species (H. Sapiens and S. cerevisiae) shows that NewGOA can more accurately and efficiently predict new annotations of proteins than other related methods. Experimental results also indicate the bi-random walks can explore and further exploit the structural difference between GO terms subgraph and proteins subgraph. The supplementary files and codes of NewGOA are available at: http://mlda.swu.edu.cn/codes.php?name=NewGOA.
To preserve life and promote health, it is critical to have access to appropriate quantities of safe food. Foodborne illness is an infectious illness or poisoning caused by viruses, bacteria, or chemicals that contaminate food or water. Food safety and health are the responsibility of everyone, from the agricultural chain to the consumer who comes into touch with food, in order to limit the amount of food poisonings. Nutritionists feel that the home, or the location where the consumer prepares food, is one of the least supervised steps all the way from the farm to the table. Dr. Lotfi Zadeh invented fuzzy logic as a superset of classical logic. This article covers the mathematical foundations of fuzzy logic, as well as membership functions, fuzzy sets, and reasoning rules. Input numbers are converted into linguistic values by fuzzy expert systems, which are subsequently modified by if-then rules provided by a human expert. The notion of a fuzzy expert system is explored in-depth, along with its rule-base and set membership functions.
Protein function prediction is a fundamental task in the postgenomic era. Available functional annotations of proteins are incomplete and the annotations of two homologous species are complementary to each other. However, how to effectively leverage mutually complementary annotations of different species to further boost the prediction performance is still not well studied. In this paper, we propose a cross-species protein function prediction approach by performing Asynchronous Random Walk on a heterogeneous network (AsyRW). AsyRW firstly constructs a heterogeneous network to integrate multiple functional association networks derived from different biological data, established homology-relationships between proteins from different species, known annotations of proteins and Gene Ontology (GO). To account for the intrinsic structures of intra-and inter-species of proteins and that of GO, AsyRW quantifies the individual walk lengths of each network node using the gravity-like theory and performs asynchronousrandom walk with the individual length to predict associations between proteins and GO terms. Experiments on annotations archived in different years show that individual walk length and asynchronous-random walk can effectively leverage the complementary annotations of different species, AsyRW has a significantly improved performance to other related and competitive methods. The codes of AsyRW are available at: http://mlda.swu.edu.cn/codes.php?name=AsyRW.
Human Motion Segmentation Based on Structure Constraint Matrix Factorization SCIENCE CHINA Information Sciences QR factorization for row or column symmetric matrix Science in China Series A-Mathematics 46, 83 (2003); Nonnegative matrix factorization and its applications in pattern recognition Chinese Science Bulletin 51, 7 (2006); Fusion prediction based on the attribute clustering network and the radial basis function
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.