Use of Chou’s 5-steps rule to predict the subcellular localization of gram-negative and gram-positive bacterial proteins by multi-label learning based on gene ontology annotation and profile alignment
Abstract:AbstractTo date, many proteins generated by large-scale genome sequencing projects are still uncharacterized and subject to intensive investigations by both experimental and computational means. Knowledge of protein subcellular localization (SCL) is of key importance for protein function elucidation. However, it remains a challenging task, especially for multiple sites proteins known to shuttle between cell compartments to perform their proper biological functions and proteins … Show more
“…One of the essential elements of cell survival is proteins, whose presence in specific cell sites determines their function and biological nature. After synthesizing proteins in the cytosol, they are directed to particular parts of the cell, including organelles, based on their biological function [ 25 ]. Prediction of subcellular protein localization is based on sorting signals, amino acids composition, and homology [ 25 ].…”
Section: Resultsmentioning
confidence: 99%
“…After synthesizing proteins in the cytosol, they are directed to particular parts of the cell, including organelles, based on their biological function [ 25 ]. Prediction of subcellular protein localization is based on sorting signals, amino acids composition, and homology [ 25 ]. Protein sorting is a complex biological mechanism often driven by specific signal sequences in nascent proteins.…”
The glycoside hydrolase family contains enzymes that break the glycosidic bonds of carbohydrates by hydrolysis. Inulinase is one of the most important industrial enzymes in the family of Glycoside Hydrolases 32 (GH32). In this study, to identify and classify bacterial inulinases initially, 16,002 protein sequences belonging to the GH32 family were obtained using various databases. The inulin-effective enzymes (endoinulinase and exoinulinase) were identified. Eight endoinulinases (EC 3.2.1.7) and 4318 exoinulinases (EC 3.2.1.80) were found. Then, the localization of endoinulinase and exoinulinase enzymes in the cell was predicted. Among them, two extracellular endoinulinases and 1232 extracellular exoinulinases were found. The biochemical properties of 363 enzymes of the genus Arthrobacter, Bacillus, and Streptomyces (most abundant) showed that exoinulinases have an acid isoelectric point up to the neutral range due to their amino acid length. That is, the smaller the protein (336 aa), the more acidic the pI (4.39), and the larger the protein (1207 aa), the pI is in the neutral range (8.84). Also, a negative gravitational index indicates the hydrophilicity of exoinulinases. Finally, considering the biochemical properties affecting protein stability and post-translational changes studies, one enzyme for endoinulinase and 40 enzymes with desirable characteristics were selected to identify their enzyme production sources. To screen and isolate enzyme-containing strains, now with the expansion of databases and the development of bioinformatics tools, it is possible to classify, review and analyze a lot of data related to different enzymeproducing strains. Although, in laboratory studies, a maximum of 20 to 30 strains can be examined. Therefore, when more strains are examined, finally, strains with more stable and efficient enzymes were selected and introduced for laboratory activities. The findings of this study can help researchers to select the Abbreviation: GH32, Glycoside Hydrolase 32.
“…One of the essential elements of cell survival is proteins, whose presence in specific cell sites determines their function and biological nature. After synthesizing proteins in the cytosol, they are directed to particular parts of the cell, including organelles, based on their biological function [ 25 ]. Prediction of subcellular protein localization is based on sorting signals, amino acids composition, and homology [ 25 ].…”
Section: Resultsmentioning
confidence: 99%
“…After synthesizing proteins in the cytosol, they are directed to particular parts of the cell, including organelles, based on their biological function [ 25 ]. Prediction of subcellular protein localization is based on sorting signals, amino acids composition, and homology [ 25 ]. Protein sorting is a complex biological mechanism often driven by specific signal sequences in nascent proteins.…”
The glycoside hydrolase family contains enzymes that break the glycosidic bonds of carbohydrates by hydrolysis. Inulinase is one of the most important industrial enzymes in the family of Glycoside Hydrolases 32 (GH32). In this study, to identify and classify bacterial inulinases initially, 16,002 protein sequences belonging to the GH32 family were obtained using various databases. The inulin-effective enzymes (endoinulinase and exoinulinase) were identified. Eight endoinulinases (EC 3.2.1.7) and 4318 exoinulinases (EC 3.2.1.80) were found. Then, the localization of endoinulinase and exoinulinase enzymes in the cell was predicted. Among them, two extracellular endoinulinases and 1232 extracellular exoinulinases were found. The biochemical properties of 363 enzymes of the genus Arthrobacter, Bacillus, and Streptomyces (most abundant) showed that exoinulinases have an acid isoelectric point up to the neutral range due to their amino acid length. That is, the smaller the protein (336 aa), the more acidic the pI (4.39), and the larger the protein (1207 aa), the pI is in the neutral range (8.84). Also, a negative gravitational index indicates the hydrophilicity of exoinulinases. Finally, considering the biochemical properties affecting protein stability and post-translational changes studies, one enzyme for endoinulinase and 40 enzymes with desirable characteristics were selected to identify their enzyme production sources. To screen and isolate enzyme-containing strains, now with the expansion of databases and the development of bioinformatics tools, it is possible to classify, review and analyze a lot of data related to different enzymeproducing strains. Although, in laboratory studies, a maximum of 20 to 30 strains can be examined. Therefore, when more strains are examined, finally, strains with more stable and efficient enzymes were selected and introduced for laboratory activities. The findings of this study can help researchers to select the Abbreviation: GH32, Glycoside Hydrolase 32.
“…At present, research pertaining to subcellular localization is divided mainly into experimental evidence and software predictions. The experimental methods include immunofluorescence (Stadler et al, 2013 ) and expression of green fluorescent protein fusion proteins (Cui et al, 2016 ), and software performed predictions are based mainly on bioinformatics (Bouziane & Chouarfia, 2020 ; Chou, 2019 ; Chou et al, 2019a ). The application of these experimental methods can provide more accurate information on protein subcellular localization, and the present research is based mainly on this type of method.…”
Protein-protein interaction (PPI) plays a crucial role in most biological processes, including signal transduction and cell apoptosis. Importantly, the knowledge of PPIs can be useful for identification of multimeric protein complexes and elucidation of uncharacterized protein functions. Arabidopsis thaliana, the best-characterized dicotyledonous plant, the steadily increasing amount of information on the levels of its proteome and signaling pathways is progressively enabling more researchers to construct models for cellular processes for the plant, which in turn encourages more experimental data to be generated. In this study, we performed an overview analysis of the 10 major organelles and their associated proteins of the dicotyledonous model plant Arabidopsis thaliana via PPI network, and found that PPI may play an important role in organelle communication. Further, multilocation proteins, especially phosphorylation-related multilocation proteins, can function as a "needle and thread" via PPIs and play an important role in organelle communication. Similar results were obtained in a monocotyledonous model crop, rice. Furthermore, we provide a research strategy for multilocation proteins by LOPIT technique, proteomics, and bioinformatics analysis and also describe their potential role in the field of plant science. The results provide a new view that the phosphorylation-related multilocation proteins play an important role in organelle communication and provide new insight into PPIs and novel directions for proteomic research. The research of phosphorylationrelated multilocation proteins may promote the development of organelle communication and provide an important theoretical basis for plant responses to external stress.
“…The literature [19] analyzes the label dependency and partial multi-label dependency problems based on extracting sample relations from input features based on positive and negative labels respectively and obtaining label information from the output space, which provides a broad idea for the introduction of multi-label association relations. There are also many approaches to introduce sample relations and label correlations in multi-label learning.…”
With the increasing amount of textual information in the Internet, smart semantic comprehension is a practical demand. Among, automatic annotation for semantic roles remains the fundamental part for effective semantic comprehension. Although machine learning-based methods had received much attention in recent years, they mostly divided each sentences into separable parts for calculation. To deal with such challenge, this paper introduces multilabel learning to propose a novel automatic annotation method for semantic roles in English text. In the semantic representation of words, the method uses convolutional neural networks to extract local feature information of words from the character level. Such design can alleviate the problem of inconspicuous semantic features caused by random initialization of unregistered words. Secondly, in the process of implication recognition, by combining the interactive attention mechanism to construct a capsule for each implication relation separately, the recognition of the final implication relation is completed in the way of categorical learning. At last, some experiments are conducted on real-world data to verify the proposed method with being compared with several typical relevant methods. The obtained results show that the proposal achieves better Macro-F1 results on eight datasets compared to seven algorithms. Besides, the proposal also performs better than others in the sensitivity testing, as its performance can remain stable with the increase of noise input. In summary, the proposal can achieve good results and show strong capability in semantic role labeling tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.