BERTweet: A pre-trained language model for English Tweets

Nguyen, Dat Quoc; Vu, Thanh H.; Nguyen, Anh Tuan

doi:10.48550/arxiv.2005.10200

Cited by 40 publications

(47 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…These experiments are purely academic, and TwHIN is not currently being applied to detecting offensive content at Twitter. For our experimental purposes, we construct a baseline approach that fine-tunes a large-scale language model for offensive content detection using linear probing and binary categorical loss; we compare the performance of RoBERTa [24] and BERTweet [28] language model, the latter of which has been pretrained on Twitter-domain data. We evaluate on two collections of tweets where some tweets have been labeled "offensive" or violating guidelines.…”

Section: Recommendation and Predictionmentioning

confidence: 99%

TwHIN: Embedding the Twitter Heterogeneous Information Network for Personalized Recommendation

El-Kishky¹,

Markovich²,

Park³

et al. 2022

Preprint

View full text Add to dashboard Cite

Social networks, such as Twitter, form a heterogeneous information network (HIN) where nodes represent domain entities (e.g., user, content, advertiser, etc.) and edges represent one of many entity interactions (e.g, a user re-sharing content or "following" another). Interactions from multiple relation types can encode valuable information about social network entities not fully captured by a single relation; for instance, a user's preference for accounts to follow may depend on both user-content engagement interactions and the other users they follow. In this work, we investigate knowledge-graph embeddings for entities in the Twitter HIN (TwHIN); we show that these pretrained representations yield significant offline and online improvement for a diverse range of downstream recommendation and classification tasks: personalized ads rankings, account followrecommendation, offensive content detection, and search ranking. We discuss design choices and practical challenges of deploying industry-scale HIN embeddings, including compressing them to reduce end-to-end model latency and handling parameter drift across versions.

show abstract

Section: Recommendation and Predictionmentioning

confidence: 99%

TwHIN: Embedding the Twitter Heterogeneous Information Network for Personalized Recommendation

El-Kishky¹,

Markovich²,

Park³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Several works followed BERT, proposing variations using more targeted data. One example is BERTweet [25], in which the authors propose an extension to deal with tweets (short messages from Twitter).…”

Section: Unsupervised Text Analysismentioning

confidence: 99%

“…For person ReID, the backbones are well-known Deep Convolutional Neural Network (DCNN) architectures: ResNet50 [36], OSNet [37], and DenseNet121 [38], all of them previously trained over the ImageNet dataset [5]. For authorship verification, we consider BERT [24], BERTweet [25], and T5 [26] architectures.…”

Section: B Implementation Detailsmentioning

confidence: 99%

Reasoning for Complex Data through Ensemble-based Self-Supervised Learning

Bertocco¹,

Theophilo²,

Andaló³

et al. 2022

Preprint

View full text Add to dashboard Cite

“…Besides, some other language-specific BERTs models developed over time for monolingual outperformed multilingual model mBERT: AraBERT (Arabic) [18], AlBERTo (Italian) [115], FinBERT (Finnish) [19], CamemBERT(French) [83], Flaubert (French [76]), BERT-CRF (Portuguese) [137], BERTje (Dutch) [141], RuBERT (Russian) [74] and BERTtweet (A pre-trained language model for English Tweets) [97]. However, to best of our knowledge, not every model has yet been tested for HS domain except AraBERT [12] [38] and AlBERTo [116] which shown better performance for HS detection.…”

Section: Overview Of Deep-learning Recordsmentioning

confidence: 99%

A systematic review of Hate Speech automatic detection using Natural Language Processing

Saroar¹,

Oussalah²

2021

Preprint

View full text Add to dashboard Cite

With the multiplication of social media platforms, which offer anonymity, easy access and online community formation and online debate, the issue of hate speech detection and tracking becomes a growing challenge to society, individual, policy-makers and researchers. Despite efforts for leveraging automatic techniques for automatic detection and monitoring, their performances are still far from satisfactory, which constantly calls for future research on the issue. This paper provides a systematic review of literature in this field, with a focus on natural language processing and deep learning technologies, highlighting the terminology, processing pipeline, core methods employed, with a focal point on deep learning architecture. From a methodological perspective, we adopt PRISMA guideline of systematic review of the last 10 years literature from ACM Digital Library and Google Scholar. In the sequel, existing surveys, limitations, and future research directions are extensively discussed.

show abstract

BERTweet: A pre-trained language model for English Tweets

Cited by 40 publications

References 7 publications

TwHIN: Embedding the Twitter Heterogeneous Information Network for Personalized Recommendation

TwHIN: Embedding the Twitter Heterogeneous Information Network for Personalized Recommendation

Reasoning for Complex Data through Ensemble-based Self-Supervised Learning

A systematic review of Hate Speech automatic detection using Natural Language Processing

Contact Info

Product

Resources

About