Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

Wang, Boxin; Wei, Ping; Xiao, Chaowei; Xu, Ping; Patwary, Mostofa; Shoeybi, Mohammad; Li, Bo; Anandkumar, Anima; Catanzaro, Bryan

doi:10.48550/arxiv.2202.04173

Cited by 1 publication

(1 citation statement)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Perplexity-a measure of how well a language model predicts the next word in a sequence [82][83][84][85][86][87][88]. A lower perplexity value indicates that the language model is better at predicting the subsequent word (9).…”

Section: Evaluation Criteria In Language Modelingmentioning

confidence: 99%

Contemporary Approaches in Evolving Language Models

Oralbekova,

Mamyrbayev,

Othman

et al. 2023

Applied Sciences

View full text Add to dashboard Cite

This article provides a comprehensive survey of contemporary language modeling approaches within the realm of natural language processing (NLP) tasks. This paper conducts an analytical exploration of diverse methodologies employed in the creation of language models. This exploration encompasses the architecture, training processes, and optimization strategies inherent in these models. The detailed discussion covers various models ranging from traditional n-gram and hidden Markov models to state-of-the-art neural network approaches such as BERT, GPT, LLAMA, and Bard. This article delves into different modifications and enhancements applied to both standard and neural network architectures for constructing language models. Special attention is given to addressing challenges specific to agglutinative languages within the context of developing language models for various NLP tasks, particularly for Arabic and Turkish. The research highlights that contemporary transformer-based methods demonstrate results comparable to those achieved by traditional methods employing Hidden Markov Models. These transformer-based approaches boast simpler configurations and exhibit faster performance during both training and analysis. An integral component of the article is the examination of popular and actively evolving libraries and tools essential for constructing language models. Notable tools such as NLTK, TensorFlow, PyTorch, and Gensim are reviewed, with a comparative analysis considering their simplicity and accessibility for implementing diverse language models. The aim is to provide readers with insights into the landscape of contemporary language modeling methodologies and the tools available for their implementation.

show abstract