Perplexed by Quality: A Perplexity-based Method for Adult and Harmful Content Detection in Multilingual Heterogeneous Web Data

Jansen, Tim; Tong, Yangling; Zevallos, Victoria; Suarez, Pedro Ortiz

doi:10.48550/arxiv.2212.10440

Cited by 3 publications

(3 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Guided by the aforementioned capabilities, we propose a pragmatic third-party detection method called LLMDet. Our approach is inspired by the observation that perplexity serves as a reliable signal for distinguishing the source of generated text, a finding that has been validated in previous work (Solaiman et al, 2019;Jansen et al, 2022;Mitchell et al, 2023). However, directly calculating perplexity requires access to LLMs, which compromises both safety and efficiency.…”

Section: Introductionmentioning

confidence: 94%

LLMDet: A Third Party Large Language Models Generated Text Detection Tool

Wu,

Pang,

Shen

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Generated texts from large language models (LLMs) are remarkably close to high-quality human-authored text, raising concerns about their potential misuse in spreading false information and academic misconduct. Consequently, there is an urgent need for a highly practical detection tool capable of accurately identifying the source of a given text. However, existing detection tools typically rely on access to LLMs and can only differentiate between machine-generated and human-authored text, failing to meet the requirements of fine-grained tracing, intermediary judgment, and rapid detection. Therefore, we propose LLMDet, a model-specific, secure, efficient, and extendable detection tool, that can source text from specific LLMs, such as GPT-2, OPT, LLaMA, and others. In LLMDet, we record the nexttoken probabilities of salient n-gram as features to calculate proxy perplexity for each LLM. By jointly analyzing the proxy perplexities of LLMs, we can determine the source of the generated text. Experimental results show that LLMDet yields impressive detection performance while ensuring speed and security, achieving 98.54% precision and about ×5.0 faster for recognizing human-authored text. Additionally, LLMDet can effortlessly extend its detection capabilities to a new open-source model. We will provide an open-source tool at https://github.com/TrustedLLM/LLMDet.

show abstract

Section: Introductionmentioning

confidence: 94%

LLMDet: A Third Party Large Language Models Generated Text Detection Tool

Wu,

Pang,

Shen

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

show abstract

“…We use perplexity (Jansen et al 2022) as a proxy to measure the linguistic quality of the generated CN. We use the XLMR model 11 to calculate the perplexity of generated CNs.…”

Section: Metricsmentioning

confidence: 99%

IndicCONAN: A Multilingual Dataset for Combating Hate Speech in Indian Context

Sahoo,

Beria,

Bhattacharyya

2024

AAAI

View full text Add to dashboard Cite

Hate speech (HS) is a growing concern in many parts of the world, including India, where it has led to numerous instances of violence and discrimination. The development of effective counter-narratives (CNs) is a critical step in combating hate speech, but there is a lack of research in this area, especially in non-English languages. In this paper, we introduce a new dataset, IndicCONAN, of counter-narratives against hate speech in Hindi and Indian English. We propose a scalable human-in-the-loop approach for generating counter-narratives by an auto-regressive language model through machine generation - human correction cycle, where the model uses augmented data from previous cycles to generate new training samples. These newly generated samples are then reviewed and edited by annotators, leading to further model refnement. The dataset consists of over 2,500 exam- ˜ ples of counter-narratives each in both English and Hindi corresponding to various hate speeches in the Indian context. We also present a framework for generating CNs conditioned on specifc CN type with a mean perplexity of 3.85 for English and 3.70 for Hindi, a mean toxicity score of 0.04 for English and 0.06 for Hindi, and a mean diversity of 0.08 for English and 0.14 for Hindi. Our dataset and framework provide valuable resources for researchers and practitioners working to combat hate speech in the Indian context.

show abstract

“…The aforementioned properties allow for perplexity to be used for automatically distinguishing between the high-and low-quality data [20], with one of the motives being the selection of data used to train new language models [21]. Perplexity can also be used for text classification based on language [22], the detection of harmful content [23], and fact checking [24].…”

Section: Definitionmentioning

confidence: 99%

Transformer-Based Composite Language Models for Text Evaluation and Classification

Škorić,

Utvić,

Stanković

2023

Mathematics

View full text Add to dashboard Cite

Parallel natural language processing systems were previously successfully tested on the tasks of part-of-speech tagging and authorship attribution through mini-language modeling, for which they achieved significantly better results than independent methods in the cases of seven European languages. The aim of this paper is to present the advantages of using composite language models in the processing and evaluation of texts written in arbitrary highly inflective and morphology-rich natural language, particularly Serbian. A perplexity-based dataset, the main asset for the methodology assessment, was created using a series of generative pre-trained transformers trained on different representations of the Serbian language corpus and a set of sentences classified into three groups (expert translations, corrupted translations, and machine translations). The paper describes a comparative analysis of calculated perplexities in order to measure the classification capability of different models on two binary classification tasks. In the course of the experiment, we tested three standalone language models (baseline) and two composite language models (which are based on perplexities outputted by all three standalone models). The presented results single out a complex stacked classifier using a multitude of features extracted from perplexity vectors as the optimal architecture of composite language models for both tasks.

show abstract

Perplexed by Quality: A Perplexity-based Method for Adult and Harmful Content Detection in Multilingual Heterogeneous Web Data

Cited by 3 publications

References 0 publications

LLMDet: A Third Party Large Language Models Generated Text Detection Tool

LLMDet: A Third Party Large Language Models Generated Text Detection Tool

IndicCONAN: A Multilingual Dataset for Combating Hate Speech in Indian Context

Transformer-Based Composite Language Models for Text Evaluation and Classification

Contact Info

Product

Resources

About