Zhixue Zhao scite author profile

Zhixue Zhao

4Publications

10Citation Statements Received

83Citation Statements Given

How they've been cited

How they cite others

124

Affiliations

University of Sheffield

Publications

Order By: Most citations

A Comparative Study of Using Pre-trained Language Models for Toxic Comment Classification

Zhao

Zhang

Hopfgartner

2021

View full text Add to dashboard Cite

As user-generated contents thrive, so does the spread of toxic comment. Therefore, detecting toxic comment becomes an active research area, and it is often handled as a text classification task. As recent popular methods for text classification tasks, pre-trained language model-based methods are at the forefront of natural language processing, achieving state-of-the-art performance on various NLP tasks. However, there is a paucity in studies using such methods on toxic comment classification. In this work, we study how to best make use of pre-trained language model-based methods for toxic comment classification and the performances of different pretrained language models on these tasks. Our results show that, Out of the three most popular language models, i.e. BERT, RoBERTa, and XLM, BERT and RoBERTa generally outperform XLM on toxic comment classification. We also prove that using a basic linear downstream structure outperforms complex ones such as CNN and BiLSTM. What is more, we find that further fine-tuning a pretrained language model with light hyper-parameter settings brings improvements to the downstream toxic comment classification task, especially when the task has a relatively small dataset. CCS CONCEPTS• Computing methodologies → Neural networks; • Social and professional topics → User characteristics.

show abstract

Utilizing subjectivity level to mitigate identity term bias in toxic comments classification

Zhao

Zhang

Hopfgartner

2022

Online Social Networks and Media

View full text Add to dashboard Cite

SS-BERT: Mitigating Identity Terms Bias in Toxic Comment Classification by Utilising the Notion of "Subjectivity" and "Identity Terms"

Zhao¹,

Zhang²,

Hopfgartner³

2021

Preprint

View full text Add to dashboard Cite

Detecting Toxic Content Online and the Effect of Training Data on Classification Performance

Zhao¹,

Zhang²,

Hopfgartner³

2019

View full text Add to dashboard Cite

The spread of toxic content online has attracted a wealth of research into methods of automatic detection and classification in recent years. However, two limitations still exist: 1) the lack of support for multi-label classification; and 2) the lack of understanding of the impact of the typical unbalanced datasets on such tasks. In this work, we build three state of the art methods for the task of multi-label classification of toxic content online, and compare the effect of training data size on their performance. The three methods of choice are based on Support Vector Machine (SVM), Convolutional Neural Networks (CNN) and Long-Short-Term Memory Networks (LSTM), respectively. We conduct learning curve analysis and show that CNN is the most robust method as it outperforms the other two regardless of the sizes of the dataset, even on very small amounts of data. This challenges the conventional belief that Neural Networks require significant amounts of data to train accurate models. We also empirically derive indicative thresholds of training data size to help determine a reliable estimate of classifier performance, or maximise potential classifier performance in such tasks.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zhixue Zhao

A Comparative Study of Using Pre-trained Language Models for Toxic Comment Classification

Utilizing subjectivity level to mitigate identity term bias in toxic comments classification

SS-BERT: Mitigating Identity Terms Bias in Toxic Comment Classification by Utilising the Notion of "Subjectivity" and "Identity Terms"

Detecting Toxic Content Online and the Effect of Training Data on Classification Performance

Contact Info

Product

Resources

About