By reducing amino acid alphabet, the protein complexity can be significantly simplified, which could improve computational efficiency, decrease information redundancy and reduce chance of overfitting. Although some reduced alphabets have been proposed, different classification rules could produce distinctive results for protein sequence analysis. Thus, it is urgent to construct a systematical frame for reduced alphabets. In this work, we constructed a comprehensive web server called RAACBook for protein sequence analysis and machine learning application by integrating reduction alphabets. The web server contains three parts: (i) 74 types of reduced amino acid alphabet were manually extracted to generate 673 reduced amino acid clusters (RAACs) for dealing with unique protein problems. It is easy for users to select desired RAACs from a multilayer browser tool. (ii) An online tool was developed to analyze primary sequence of protein. The tool could produce K-tuple reduced amino acid composition by defining three correlation parameters (K-tuple, g-gap, λ-correlation). The results are visualized as sequence alignment, mergence of RAA composition, feature distribution and logo of reduced sequence. (iii) The machine learning server is provided to train the model of protein classification based on K-tuple RAAC. The optimal model could be selected according to the evaluation indexes (ROC, AUC, MCC, etc.). In conclusion, RAACBook presents a powerful and user-friendly service in protein sequence analysis and computational proteomics. RAACBook can be freely available at http://bioinfor.imu.edu.cn/raacbook. Database URL: http://bioinfor.imu.edu.cn/raacbook
Sequence logos give a fast and concise display in visualizing consensus sequence. Protein exhibits greater complexity and diversity than DNA, which usually affects the graphical representation of the logo. Reduced amino acids perform powerful ability for simplifying complexity of sequence alignment, which motivated us to establish RaacLogo. As a new sequence logo generator by using reduced amino acid alphabets, RaacLogo can easily generate many different simplified logos tailored to users by selecting various reduced amino acid alphabets that consisted of more than 40 clustering algorithms. This current web server provides 74 types of reduced amino acid alphabet, which were manually extracted to generate 673 reduced amino acid clusters (RAACs) for dealing with protein alignment. A two-dimensional selector was proposed for easily selecting desired RAACs with underlying biology knowledge. It is anticipated that the RaacLogo web server will play more high-potential roles for protein sequence alignment, topological estimation and protein design experiments. RaacLogo is freely available at http://bioinfor.imu.edu.cn/raaclogo.
Defensins as 1 of major classes of host defense peptides play a significant role in the innate immunity, which are extremely evolved in almost all living organisms. Developing high-throughput computational methods can accurately help in designing drugs or medical means to defense against pathogens. To take up such a challenge, an up-to-date server based on rigorous benchmark dataset, referred to as iDEF-PseRAAC, was designed for predicting the defensin family in this study. By extracting primary sequence compositions based on different types of reduced amino acid alphabet, it was calculated that the best overall accuracy of the selected feature subset was achieved to 92.38%. Therefore, we can conclude that the information provided by abundant types of amino acid reduction will provide efficient and rational methodology for defensin identification. And, a free online server is freely available for academic users at http://bioinfor.imu.edu.cn/idpf . We hold expectations that iDEF-PseRAAC may be a promising weapon for the function annotation about the defensins protein.
Understanding early development offers a striking opportunity to investigate genetic disease, stem cell and assisted reproductive technology. Recent advances in high-throughput sequencing technology have led to the rising influx of omics data, which have rapidly boosted our understanding of mammalian developmental mechanisms. Here, we review the database EmExplorer (a database for exploring time activation of gene expression in mammalian embryos), which systematically organizes the genes from development-related pathways, and which we have already established and continue to update it. The current version of EmExplorer incorporates over 26 000 genes obtained from 306 functional pathways in five species. The function annotations of development-related genes were also integrated into EmExplorer. To facilitate data extraction, the database also contains the following information. (i) The dynamic expression values for each development stage are matched to the corresponding genes. (ii) A two-layer search tool which supports multi-option searching, such as by official symbol, pathway name and function annotation. The returned entries can directly link to the analysis results for the corresponding gene or pathway in the analysis module. (iii) The analysis module provides different gene comparisons at the multi-species level and functional pathway level, which shows the species specificity and stage specificity at the gene or pathway level. (iv) The analysis based on the hypergeometric distribution test reveals the enrichment of gene functions at a particular stage of one organism's pathway. (v) The browser is designed for users with ambiguous searching goals and greatly helps new users to get a general idea of the contents of the database. (vi) The experimentally validated pathways are manually curated and shown on the home page. EmExplorer will be helpful for elucidating early developmental mechanisms and exploring time activation genes. EmExplorer is freely available at http://bioinfor.imu.edu.cn/emexplorer .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.