Arginine
methylation is one of the most essential protein post-translational
modifications. Identifying the site of arginine methylation is a critical
problem in biology research. Unfortunately, biological experiments
such as mass spectrometry are expensive and time-consuming. Hence,
predicting arginine methylation by machine learning is an alternative
fast and efficient way. In this paper, we focus on the systematic
characterization of arginine methylation with composition–transition–distribution
(CTD) features. The presented framework consists of three stages.
In the first stage, we extract CTD features from 1750 samples and
exploit decision tree to generate accurate prediction. The accuracy
of prediction can reach 96%. In the second stage, the support vector
machine can predict the number of arginine methylation sites with
0.36
R
-squared. In the third stage, experiments carried
out with the updated arginine methylation site data set show that
utilizing CTD features and adopting random forest as the classifier
outperform previous methods. The accuracy of identification can reach
82.1 and 82.5% in single methylarginine and double methylarginine
data sets, respectively. The discovery presented in this paper can
be helpful for future research on arginine methylation.
ATP-binding cassette (ABC) proteins play important roles in a wide variety of species. These proteins are involved in absorbing nutrients, exporting toxic substances, and regulating potassium channels, and they contribute to drug resistance in cancer cells. Therefore, the identification of ABC transporters is an urgent task. The present study used 188D as the feature extraction method, which is based on sequence information and physicochemical properties. We also visualized the feature extracted by t-Distributed Stochastic Neighbor Embedding (t-SNE). The sample based on the features extracted by 188D may be separated. Further, random forest (RF) is an efficient classifier to identify proteins. Under the 10-fold cross-validation of the model proposed here for a training set, the average accuracy rate of 10 training sets was 89.54%. We obtained values of 0.87 for specificity, 0.92 for sensitivity, and 0.79 for MCC. In the testing set, the accuracy achieved was 89%. These results suggest that the model combining 188D with RF is an optimal tool to identify ABC transporters.
Five-prime single-cell RNA-seq (scRNA-seq) has been widely employed to profile cellular transcriptomes, however, its power of analysing transcription start sites (TSS) has not been fully utilised. Here, we present a computational method suite, CamoTSS, to precisely identify TSS and quantify its expression by leveraging the cDNA on read 1, which enables effective detection of alternative TSS usage. With various experimental data sets, we have demonstrated that CamoTSS can accurately identify TSS and the detected alternative TSS usages showed strong specificity in different biological processes, including cell types across human organs, the development of human thymus, and cancer conditions. As evidenced in nasopharyngeal cancer, alternative TSS usage can also reveal regulatory patterns including systematic TSS dysregulations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.