TWilBert: Pre-trained deep bidirectional transformers for Spanish Twitter

González, José Ángel; Hurtado, Lluís F.; Pla, Ferrán

doi:10.1016/j.neucom.2020.09.078

Cited by 38 publications

(31 citation statements)

References 19 publications

(43 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The attention mechanism is the crux behind many state-of-theart sequence-to-sequence models used in machine translation and language processing 40 and it has recently shown good results on multi-label classification. 41 While the attention mechanism has also been recently adopted to perform learning of relationships among elements in material property prediction, 34,35 our model additionally uses the attention mechanism to perform learning of relationships among multiple material properties by acting on the output of the multivariate Gaussian model as opposed to the composition itself.…”

Section: Discussionmentioning

confidence: 99%

“…Higher-order property correlation learning proceeds via an attention graph neural network, whose description can be found in prior literature. 34,35,40,41 We use five attention layers, namely, the message-passing operations are executed five times. Each attention layer also includes an element-wise feed-forward MLP which has two layers of 128 neurons each.…”

Section: H-clmp Modelmentioning

confidence: 99%

“…Soft-attention builds upon this concept by allowing the function that produces the attention coefficients to be learned directly from the data. 40,41 Roost and CrabNet use graph attention networks (GAT) wherein the nodes are elements, enabling learning of interactions among elements. Additionally, H-CLMP uses GAT where the nodes are multi-property embeddings to learn relationships among multiple materials properties.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Materials Representation and Transfer Learning for Multi-Property Prediction

Kong¹,

Guevarra²,

Gomes³

et al. 2021

Preprint

View full text Add to dashboard Cite

The adoption of machine learning in materials science has rapidly transformed materials property prediction. Hurdles limiting full capitalization of recent advancements in machine learning include the limited development of methods to learn the underlying interactions of multiple elements, as well as the relationships among multiple properties, to facilitate property prediction in new composition spaces. To address these issues, we introduce the Hierarchical Correlation Learning for Multi-property Prediction (H-CLMP) framework that seamlessly integrates (i) prediction using only a material’s composition, (ii) learning and exploitation of correlations among target properties in multitarget regression, and (iii) leveraging training data from tangential domains via generative transfer learning. The model is demonstrated for prediction of spectral optical absorption of complex metal oxides spanning 69 3-cation metal oxide composition spaces. H-CLMP accurately predicts non-linear composition-property relationships in composition spaces for which no training data is available, which broadens the purview of machine learning to the discovery of materials with exceptional properties. This achievement results from the principled integration of latent embedding learning, property correlation learning, generative transfer learning, and attention models. The best performance is obtained using H-CLMP with Transfer learning (H-CLMP(T)) wherein a generative adversarial network is trained on computational density of states data and deployed in the target domain to augment prediction of optical absorption from composition. H-CLMP(T) aggregates multiple knowledge sources with a framework that is well-suited for multi-target regression across the physical sciences.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: H-clmp Modelmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Materials Representation and Transfer Learning for Multi-Property Prediction

Kong¹,

Guevarra²,

Gomes³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The most recent works proposed language models specifically pre-trained on tweet corpora: Thakkar and Pinnis [ 16 ] achieved encouraging performance leveraging a time-balanced evaluation set for sentiment analysis on Latvian tweets, comparing several BERT-based architectures, and Nguyen et al [ 12 ] presented BERTweet, the first public large-scale pre-trained language model for English tweets; Ángel González et al [ 15 ] proposed TWiLBERT, a specialization of the BERT architecture both for the Spanish language and the Twitter domain. For languages other than English, such as Persian [ 53 ] and Arabic [ 54 ], recent studies have also focused on deep neural networks such as CNN and LSTM.…”

Section: Background and Related Workmentioning

confidence: 99%

“…In the field of sentiment analysis of tweets, most of the scientific literature has obtained state-of-the-art results adopting the approach of training language models directly from scratch starting from corpora made up exclusively of tweets, so that the models could better handle the specific tweet jargon, characterized by a particular syntax and grammar not containing punctuation, with contracted or elongated words, keywords, hashtags, emoticons, emojis and so on. These approaches, working not only in English [ 11 , 12 ], but also in other languages such as Italian [ 13 ], Spanish [ 14 , 15 ], and Latvian [ 16 ], necessarily impose two constraints: the first requires the building of large corpora of tweets to be used for training the language models in the specific language considered, and the second is the need for substantial resources, of both hardware and time, to train the models from scratch starting from these corpora.…”

Section: Introductionmentioning

confidence: 99%

An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian

Pota

Ventura

Catelli

et al. 2020

Sensors

View full text Add to dashboard Cite

Over the last decade industrial and academic communities have increased their focus on sentiment analysis techniques, especially applied to tweets. State-of-the-art results have been recently achieved using language models trained from scratch on corpora made up exclusively of tweets, in order to better handle the Twitter jargon. This work aims to introduce a different approach for Twitter sentiment analysis based on two steps. Firstly, the tweet jargon, including emojis and emoticons, is transformed into plain text, exploiting procedures that are language-independent or easily applicable to different languages. Secondly, the resulting tweets are classified using the language model BERT, but pre-trained on plain text, instead of tweets, for two reasons: (1) pre-trained models on plain text are easily available in many languages, avoiding resource- and time-consuming model training directly on tweets from scratch; (2) available plain text corpora are larger than tweet-only ones, therefore allowing better performance. A case study describing the application of the approach to Italian is presented, with a comparison with other Italian existing solutions. The results obtained show the effectiveness of the approach and indicate that, thanks to its general basis from a methodological perspective, it can also be promising for other languages.

show abstract