Justin Bo Kai Hsu scite author profile

Background Functional RNA molecules participate in numerous biological processes, ranging from gene regulation to protein synthesis. Analysis of functional RNA motifs and elements in RNA sequences can obtain useful information for deciphering RNA regulatory mechanisms. Our previous work, RegRNA, is widely used in the identification of regulatory motifs, and this work extends it by incorporating more comprehensive and updated data sources and analytical approaches into a new platform. Methods and results An integrated web-based system, RegRNA 2.0, has been developed for comprehensively identifying the functional RNA motifs and sites in an input RNA sequence. Numerous data sources and analytical approaches are integrated, and several types of functional RNA motifs and sites can be identified by RegRNA 2.0: (i) splicing donor/acceptor sites; (ii) splicing regulatory motifs; (iii) polyadenylation sites; (iv) ribosome binding sites; (v) rho-independent terminator; (vi) motifs in mRNA 5'-untranslated region (5'UTR) and 3'UTR; (vii) AU-rich elements; (viii) C-to-U editing sites; (ix) riboswitches; (x) RNA cis-regulatory elements; (xi) transcriptional regulatory motifs; (xii) user-defined motifs; (xiii) similar functional RNA sequences; (xiv) microRNA target sites; (xv) non-coding RNA hybridization sites; (xvi) long stems; (xvii) open reading frames; (xviii) related information of an RNA sequence. User can submit an RNA sequence and obtain the predictive results through RegRNA 2.0 web page. Conclusions RegRNA 2.0 is an easy to use web server for identifying regulatory RNA motifs and functional sites. Through its integrated user-friendly interface, user is capable of using various analytical approaches and observing results with graphical visualization conveniently. RegRNA 2.0 is now available at http://regrna2.mbc.nctu.edu.tw.

show abstract

Machine Learning–Based Radiomics for Molecular Subtyping of Gliomas

Hsu

Hsieh

et al. 2018

227

154

View full text Add to dashboard Cite

The new classification announced by the World Health Organization in 2016 recognized five molecular subtypes of diffuse gliomas based on isocitrate dehydrogenase (IDH) and 1p/19q genotypes in addition to histologic phenotypes. We aim to determine whether clinical MRI can stratify these molecular subtypes to benefit the diagnosis and monitoring of gliomas. The data from 456 subjects with gliomas were obtained from The Cancer Imaging Archive. Overall, 214 subjects, including 106 cases of glioblastomas and 108 cases of lower grade gliomas with preoperative MRI, survival data, histology, IDH, and 1p/19q status were included. We proposed a three-level machine-learning model based on multimodal MR radiomics to classify glioma subtypes. An independent dataset with 70 glioma subjects was further collected to verify the model performance. The IDH and 1p/19q status of gliomas can be classified by radiomics and machine-learning approaches, with areas under ROC curves between 0.922 and 0.975 and accuracies between 87.7% and 96.1% estimated on the training dataset. The test on the validation dataset showed a comparable model performance with that on the training dataset, suggesting the efficacy of the trained classifiers. The classification of 5 molecular subtypes solely based on the MR phenotypes achieved an 81.8% accuracy, and a higher accuracy of 89.2% could be achieved if the histology diagnosis is available. The MR radiomics-based method provides a reliable alternative to determine the histology and molecular subtypes of gliomas. .

show abstract

Incorporating structural characteristics for identification of protein methylation sites

et al. 2009

View full text Add to dashboard Cite

Studies over the last few years have identified protein methylation on histones and other proteins that are involved in the regulation of gene transcription. Several works have developed approaches to identify computationally the potential methylation sites on lysine and arginine. Studies of protein tertiary structure have demonstrated that the sites of protein methylation are preferentially in regions that are easily accessible. However, previous studies have not taken into account the solvent-accessible surface area (ASA) that surrounds the methylation sites. This work presents a method named MASA that combines the support vector machine with the sequence and structural characteristics of proteins to identify methylation sites on lysine, arginine, glutamate, and asparagine. Since most experimental methylation sites are not associated with corresponding protein tertiary structures in the Protein Data Bank, the effective solvent-accessible prediction tools have been adopted to determine the potential ASA values of amino acids in proteins. Evaluation of predictive performance by cross-validation indicates that the ASA values around the methylation sites can improve the accuracy of prediction. Additionally, an independent test reveals that the prediction accuracies for methylated lysine and arginine are 80.8 and 85.0%, respectively. Finally, the proposed method is implemented as an effective system for identifying protein methylation sites. The developed web server is freely available at http://MASA.mbc.nctu.edu.tw/.

show abstract

Identification of potential biomarkers related to glioma survival by gene expression profile analysis

et al. 2019

View full text Add to dashboard Cite

Background Recent studies have proposed several gene signatures as biomarkers for different grades of gliomas from various perspectives. However, most of these genes can only be used appropriately for patients with specific grades of gliomas. Methods In this study, we aimed to identify survival-relevant genes shared between glioblastoma multiforme (GBM) and lower-grade glioma (LGG), which could be used as potential biomarkers to classify patients into different risk groups. Cox proportional hazard regression model (Cox model) was used to extract relative genes, and effectiveness of genes was estimated against random forest regression. Finally, risk models were constructed with logistic regression. Results We identified 104 key genes that were shared between GBM and LGG, which could be significantly correlated with patients’ survival based on next-generation sequencing data obtained from The Cancer Genome Atlas for gene expression analysis. The effectiveness of these genes in the survival prediction of GBM and LGG was evaluated, and the average receiver operating characteristic curve (ROC) area under the curve values ranged from 0.7 to 0.8. Gene set enrichment analysis revealed that these genes were involved in eight significant pathways and 23 molecular functions. Moreover, the expressions of ten ( CTSZ, EFEMP2 , ITGA5 , KDELR2 , MDK , MICALL2, MAP 2 K3 , PLAUR , SERPINE1 , and SOCS3 ) of these genes were significantly higher in GBM than in LGG, and comparing their expression levels to those of the proposed control genes ( TBP , IPO8 , and SDHA ) could have the potential capability to classify patients into high- and low- risk groups, which differ significantly in the overall survival. Signatures of candidate genes were validated, by multiple microarray datasets from Gene Expression Omnibus, to increase the robustness of using these potential prognostic factors. In both the GBM and LGG cohort study, most of the patients in the high-risk group had the IDH1 wild-type gene, and those in the low-risk group had IDH1 mutations. Moreover, most of the high-risk patients with LGG possessed a 1p/19q-noncodeletion. Conclusion In this study, we identified survival relevant genes which were shared between GBM and LGG, and those enabled to classify patients into high- and low-risk groups based on expression level analysis. Both the risk groups could be correlated with the well-known genetic variants, thus suggesting their potential prognostic value in clinical application. Electronic supplementary material The online version of this article (10.1186/s1...

show abstract

Incorporating support vector machine for identifying protein tyrosine sulfation sites

et al. 2009

View full text Add to dashboard Cite

Tyrosine sulfation is a post-translational modification of many secreted and membrane-bound proteins. It governs protein-protein interactions that are involved in leukocyte adhesion, hemostasis, and chemokine signaling. However, the intrinsic feature of sulfated protein remains elusive and remains to be delineated. This investigation presents SulfoSite, which is a computational method based on a support vector machine (SVM) for predicting protein sulfotyrosine sites. The approach was developed to consider structural information such as concerning the secondary structure and solvent accessibility of amino acids that surround the sulfotyrosine sites. One hundred sixty-two experimentally verified tyrosine sulfation sites were identified using UniProtKB/SwissProt release 53.0. The results of a five-fold cross-validation evaluation suggest that the accessibility of the solvent around the sulfotyrosine sites contributes substantially to predictive accuracy. The SVM classifier can achieve an accuracy of 94.2% in five-fold cross validation when sequence positional weighted matrix (PWM) is coupled with values of the accessible surface area (ASA). The proposed method significantly outperforms previous methods for accurately predicting the location of tyrosine sulfation sites.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Justin Bo Kai Hsu

An enhanced computational platform for investigating the roles of regulatory RNA and for identifying functional RNA motifs

Machine Learning–Based Radiomics for Molecular Subtyping of Gliomas

Incorporating structural characteristics for identification of protein methylation sites

Identification of potential biomarkers related to glioma survival by gene expression profile analysis

Incorporating support vector machine for identifying protein tyrosine sulfation sites

Contact Info

Product

Resources

About