Jiabao Sheng scite author profile

Modern medicine is reliant on various medical imaging technologies for non-invasively observing patients’ anatomy. However, the interpretation of medical images can be highly subjective and dependent on the expertise of clinicians. Moreover, some potentially useful quantitative information in medical images, especially that which is not visible to the naked eye, is often ignored during clinical practice. In contrast, radiomics performs high-throughput feature extraction from medical images, which enables quantitative analysis of medical images and prediction of various clinical endpoints. Studies have reported that radiomics exhibits promising performance in diagnosis and predicting treatment responses and prognosis, demonstrating its potential to be a non-invasive auxiliary tool for personalized medicine. However, radiomics remains in a developmental phase as numerous technical challenges have yet to be solved, especially in feature engineering and statistical modeling. In this review, we introduce the current utility of radiomics by summarizing research on its application in the diagnosis, prognosis, and prediction of treatment responses in patients with cancer. We focus on machine learning approaches, for feature extraction and selection during feature engineering and for imbalanced datasets and multi-modality fusion during statistical modeling. Furthermore, we introduce the stability, reproducibility, and interpretability of features, and the generalizability and interpretability of models. Finally, we offer possible solutions to current challenges in radiomics research.

show abstract

AgglutiFiT: Efficient Low-Resource Agglutinative Language Model Fine-Tuning

Sheng

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Text classification tends to be difficult when data are inadequate considering the amount of manually labeled text corpora. For low-resource agglutinative languages including Uyghur, Kazakh, and Kyrgyz (UKK languages), in which words are manufactured via stems concatenated with several suffixes and stems are used as the representation of text content, this feature allows infinite derivatives vocabulary that leads to high uncertainty of writing forms and huge redundant features. There are major challenges of lowresource agglutinative text classification the lack of labeled data in a target domain and morphologic diversity of derivations in language structures. It is an effective solution which fine-tuning a pre-trained language model to provide meaningful and favorable-to-use feature extractors for downstream text classification tasks. To this end, we propose a low-resource agglutinative language model fine-tuning AgglutiFiT , specifically, we build a low-noise fine-tuning dataset by morphological analysis and stem extraction, then fine-tune the cross-lingual pre-training model on this dataset. Moreover, we propose an attention-based fine-tuning strategy that better selects relevant semantic and syntactic information from the pre-trained language model and uses those features on downstream text classification tasks. We evaluate our methods on nine Uyghur, Kazakh, and Kyrgyz classification datasets, where they have significantly better performance compared with several strong baselines.

show abstract

Knowledge-Guided Sentiment Analysis Via Learning From Natural Language Explanations

Sheng

et al. 2021

IEEE Access

View full text Add to dashboard Cite

Rumor Detection on Social Media via Fused Semantic Information and a Propagation Heterogeneous Graph

Zhou

et al. 2020

Symmetry

View full text Add to dashboard Cite

Social media had a revolutionary impact because it provides an ideal platform for share information; however, it also leads to the publication and spreading of rumors. Existing rumor detection methods have relied on finding cues from only user-generated content, user profiles, or the structures of wide propagation. However, the previous works have ignored the organic combination of wide dispersion structures in rumor detection and text semantics. To this end, we propose KZWANG, a framework for rumor detection that provides sufficient domain knowledge to classify rumors accurately, and semantic information and a propagation heterogeneous graph are symmetry fused together. We utilize an attention mechanism to learn a semantic representation of text and introduce a GCN to capture the global and local relationships among all the source microblogs, reposts, and users. An organic combination of text semantics and propagating heterogeneous graphs is then used to train a rumor detection classifier. Experiments on Sina Weibo, Twitter15, and Twitter16 rumor detection datasets demonstrate the proposed model’s superiority over baseline methods. We also conduct an ablation study to understand the relative contributions of the various aspects of the method we proposed.

show abstract

Low-Resource Text Classification via Cross-Lingual Language Model Fine-Tuning

Sheng

et al. 2020

View full text Add to dashboard Cite

Text classification tends to be difficult when data are inadequate considering the amount of manually labeled text corpora. For low-resource agglutinative languages including Uyghur, Kazakh, and Kyrgyz (UKK languages), in which words are manufactured via stems concatenated with several suffixes and stems are used as the representation of text content, this feature allows infinite derivatives vocabulary that leads to high uncertainty of writing forms and huge redundant features. There are major challenges of low-resource agglutinative text classification the lack of labeled data in a target domain and morphologic diversity of derivations in language structures. It is an effective solution which fine-tuning a pre-trained language model to provide meaningful and favorable-to-use feature extractors for downstream text classification tasks. To this end, we propose a low-resource agglutinative language model fine-tuning AgglutiF iT , specifically, we build a low-noise fine-tuning dataset by morphological analysis and stem extraction, then finetune the cross-lingual pre-training model on this dataset. Moreover, we propose an attentionbased fine-tuning strategy that better selects relevant semantic and syntactic information from the pre-trained language model and uses those features on downstream text classification tasks. We evaluate our methods on nine Uyghur, Kazakh, and Kyrgyz classification datasets, where they have significantly better performance compared with several strong baselines.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jiabao Sheng

Artificial intelligence-driven radiomics study in cancer: the role of feature engineering and modeling

AgglutiFiT: Efficient Low-Resource Agglutinative Language Model Fine-Tuning

Knowledge-Guided Sentiment Analysis Via Learning From Natural Language Explanations

Rumor Detection on Social Media via Fused Semantic Information and a Propagation Heterogeneous Graph

Low-Resource Text Classification via Cross-Lingual Language Model Fine-Tuning

Contact Info

Product

Resources

About