Overview of the Transformer-based Models for NLP Tasks

Gillioz, Anthony; Casas, Jacky; Mugellini, Elena; Khaled, Omar Abou

doi:10.15439/2020f20

Cited by 174 publications

(84 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…During masked language modelling, input tokens are randomly masked and subsequently predicted in order to obtain a "deep bidirectional representation" [15]. This allows BERT to counter the "unidirectional constraint" [19] of other language models such as GPT [46] by not allowing the model to "see itself" and thus "trivially predict the next token" when learning both right to left and left to right [19]. The next stage of pretraining takes the form of binarised next sentence prediction where sentence A precedes sentence B 50% of the time, allowing the model to learn the "relationship between two sentences" [19].…”

Section: B Bertmentioning

confidence: 99%

“…This allows BERT to counter the "unidirectional constraint" [19] of other language models such as GPT [46] by not allowing the model to "see itself" and thus "trivially predict the next token" when learning both right to left and left to right [19]. The next stage of pretraining takes the form of binarised next sentence prediction where sentence A precedes sentence B 50% of the time, allowing the model to learn the "relationship between two sentences" [19]. BERT models are then fine tuned by adding a classification layer and updating all parameters based on a downstream task, in this case, fake news classification.…”

Section: B Bertmentioning

confidence: 99%

See 1 more Smart Citation

Transforming Fake News: Robust Generalisable News Classification Using Transformers

Ciara

Atapour-Abarghouei

2021

2021 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

As online news has become increasingly popular and fake news increasingly prevalent, the ability to audit the veracity of online news content has become more important than ever. Such a task represents a binary classification challenge, for which transformers have achieved state-of-the-art results. Using the publicly available ISOT and Combined Corpus datasets, this study explores transformers' abilities to identify fake news, with particular attention given to investigating generalisation to unseen datasets with varying styles, topics and class distributions. Moreover, we explore the idea that opinion-based news articles cannot be classified as real or fake due to their subjective nature and often sensationalised language, and propose a novel two-step classification pipeline to remove such articles from both model training and the final deployed inference system. Experiments over the ISOT and Combined Corpus datasets show that transformers achieve an increase in F1 scores of up to 4.9% for out of distribution generalisation compared to baseline approaches, with a further increase of 10.1% following the implementation of our two-step classification pipeline. To the best of our knowledge, this study is the first to investigate generalisation of transformers in this context.

show abstract

Section: B Bertmentioning

confidence: 99%

Section: B Bertmentioning

confidence: 99%

Transforming Fake News: Robust Generalisable News Classification Using Transformers

Ciara

Atapour-Abarghouei

2021

2021 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

show abstract

“…26 Specifically, in their work Devlin et al reported that BERT transformer mode significantly outperformed bidirectional LSTM (state-of-the-art at that time) on the General Language Understanding Evaluation (GLUE) 61 benchmark with 71 and 82 average GLUE performance, Transformer models leverage large text corpora, akin to BookCorpus 62 or the English Wikipedia data set, and high expressive capacity to define the new state-of-the-art performance on a plethora of NLP tasks. These tasks include text classification, named entity recognition (NER), semantic text similarity (STS), text summary, question answering (QA), reading comprehension, knowledge discovery (KD) and mapping and other (reviewed in 63 ). Further boost in performance in the novel transformer architectures is achieved through the multi-headed attention mechanism.…”

Section: Host-pathogen Interactions Analysis From the Language Data In The Scientific Publicationsmentioning

confidence: 99%

Machine Learning and Artificial Intelligence for the Prediction of Host–Pathogen Interactions: A Viral Case

Yakimovich¹

2021

IDR

View full text Add to dashboard Cite

“…Multitask learning [10] aims to learn individual sub-tasks separately and use those learnings inductively to solve a main task by identifying the dependence between the tasks. Separate multitask models are built to predict the rate of change of stock prices and to predict the actual stock price itself.…”

Section: Id Multitask Learningmentioning

confidence: 99%

Analysis of the Effect of News Sentiment on Stock Market Prices through Event Embedding

Sridhar¹,

Sanagavarapu²

2021

Annals of Computer Science and Information Systems

View full text Add to dashboard Cite

Stock market price prediction models have remained a prominent challenge for the investors owing to their volatile nature. The impact of macroeconomic events such as news headlines is studied here using a standard dataset with closing stock price rates for a chosen period by performing sentiment analysis using a Random Forest classifier. A Bi-LSTM time-series forecasting model is constructed to predict the stock prices by using the polarity of the news headlines. It is observed that Random Forest Classifiers predict the polarity of news articles with an accuracy of 84.92%.

show abstract

Overview of the Transformer-based Models for NLP Tasks

Cited by 174 publications

References 24 publications

Transforming Fake News: Robust Generalisable News Classification Using Transformers

Transforming Fake News: Robust Generalisable News Classification Using Transformers

Machine Learning and Artificial Intelligence for the Prediction of Host–Pathogen Interactions: A Viral Case

Analysis of the Effect of News Sentiment on Stock Market Prices through Event Embedding

Contact Info

Product

Resources

About