Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations

Araujo, Vladimir; Villa, Andrés; Mendoza, Marcelo; Moens, Marie‐Francine; Soto, Álvaro

doi:10.18653/v1/2021.emnlp-main.240

Cited by 4 publications

(5 citation statements)

References 30 publications

(37 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Pragmatic Coherence To include wider sentence context, this paradigm focuses on the incorporation of coherence in terms of capturing the transition of meaning in longer contexts. Typically, such models are trained by predicting the correct subsequent input (i.e., word sequences or sentences), inspired by the concept of predictive coding [15,4]. In this paradigm, we employ SkipThoughts [23], GPT-2 [36] as well as GPT-3 [7] 4 , as these approaches are based on next word or sentence prediction training objectives.…”

Section: Sentence Embedding Modelsmentioning

confidence: 99%

“…Typically, such models are trained by predicting the correct subsequent input (i.e., word sequences or sentences), inspired by the concept of predictive coding [15,4]. In this paradigm, we employ SkipThoughts [23], GPT-2 [36] as well as GPT-3 [7] 4 , as these approaches are based on next word or sentence prediction training objectives. Here, we include both GPT-2 and GPT-3 in order to examine the possible effect of the extended input length used during the pre-training procedure in GPT-3 (4096 tokens, compared to 1024 tokens for GPT-2) on the resulting neural fits.…”

Section: Sentence Embedding Modelsmentioning

confidence: 99%

“…All appendices can be accessed at https://github.com/lcn-kul/sentencefmricomparison4 The GPT-3 embeddings for all text inputs in this analysis are based on Ope-nAI's embedding API using the text-embedding-ada-002 model…”

mentioning

confidence: 99%

See 2 more Smart Citations

Investigating Neural Fit Approaches for Sentence Embedding Model Paradigms

Balabin,

Liuzzi,

Sun

et al. 2023

Frontiers in Artificial Intelligence and Applications

View full text Add to dashboard Cite

In recent years, representations from brain activity patterns and pre-trained language models have been linked to each other based on neural fits to validate hypotheses about language processing. Nonetheless, open questions remain about what intrinsic properties of language processing these neural fits reflect and whether they differ across neural fit approaches, brain networks, and models. In this study, we use parallel sentence and functional magnetic resonance imaging data to perform a comprehensive analysis of four paradigms (masked language modeling, pragmatic coherence, semantic comparison, and contrastive learning) representing linguistic hypotheses about sentence processing. We include three sentence embedding models for each paradigm, resulting in a total of 12 models, and examine differences in their neural fit to four different brain networks using regression-based neural encoding and Representational Similarity Analysis (RSA). Among the different models tested, GPT-2, SkipThoughts, and S-RoBERTa yielded the strongest correlations with language network patterns, whereas contrastive learning-based models resulted in overall low neural fits. Our findings demonstrate that neural fits vary across brain networks and models representing the same linguistic hypothesis (e.g., GPT-2 and GPT-3). More importantly, we show the need for both neural encoding and RSA as complementary methods to provide full understanding of neural fits. All code used in the analysis is publicly available: https://github.com/lcn-kul/sentencefmricomparison.

show abstract

Section: Sentence Embedding Modelsmentioning

confidence: 99%

Section: Sentence Embedding Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Investigating Neural Fit Approaches for Sentence Embedding Model Paradigms

Balabin,

Liuzzi,

Sun

et al. 2023

Frontiers in Artificial Intelligence and Applications

View full text Add to dashboard Cite

show abstract

“…Figure 4: Our rehearsal and anticipation (r/a) decoder receives a masked input E r/a and the more recent memory updated M t+1 to predict the masked tokens and whether the segment belongs to the past or future. (Zhang et al, 2021) and anticipation as the prediction of the future (Oord et al, 2018;Araujo et al, 2021). To use the same machinery, we pose these processes as masked modeling tasks that predict past and future coreference-related tokens.…”

Section: Masked Modelingmentioning

confidence: 99%

A Memory Model for Question Answering from Streaming Data Supported by Rehearsal and Anticipation of Coreference Information

Araujo¹,

Soto²,

Moens³

2023

Findings of the Association for Computational Linguistics: ACL 2023

View full text Add to dashboard Cite

Existing question answering methods often assume that the input content (e.g., documents or videos) is always accessible to solve the task. Alternatively, memory networks were introduced to mimic the human process of incremental comprehension and compression of the information in a fixed-capacity memory. However, these models only learn how to maintain memory by backpropagating errors in the answers through the entire network. Instead, it has been suggested that humans have effective mechanisms to boost their memorization capacities, such as rehearsal and anticipation. Drawing inspiration from these, we propose a memory model that performs rehearsal and anticipation while processing inputs to memorize important information for solving question answering tasks from streaming data. The proposed mechanisms are applied self-supervised during training through masked modeling tasks focused on coreference information. We validate our model on a short-sequence (bAbI) dataset as well as large-sequence textual (Nar-rativeQA) and video (ActivityNet-QA) question answering datasets, where it achieves substantial improvements over previous memory network approaches. Furthermore, our ablation study confirms the proposed mechanisms' importance for memory models.

show abstract

“…These BERTtype models produce sentence representations using a special token [CLS]. More recently, some models (Lee et al, 2020;Iter et al, 2020;Araujo et al, 2021b) have been proposed to improve discourse-level representations by incorporating additional components or mechanisms into the vanilla BERT. Furthermore, due to the success of deep learning sentence encoders, some Spanish models were released.…”

Section: Sentence Encodersmentioning

confidence: 99%

Evaluation Benchmarks for Spanish Sentence Representations

Araujo¹,

Carvallo²,

Kundu³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Due to the success of pre-trained language models, versions of languages other than English have been released in recent years. This fact implies the need for resources to evaluate these models. In the case of Spanish, there are few ways to systematically assess the models' quality. In this paper, we narrow the gap by building two evaluation benchmarks. Inspired by previous work (Conneau and Kiela, 2018;Chen et al., 2019), we introduce Spanish SentEval and Spanish DiscoEval, aiming to assess the capabilities of stand-alone and discourse-aware sentence representations, respectively. Our benchmarks include considerable pre-existing and newly constructed datasets that address different tasks from various domains. In addition, we evaluate and analyze the most recent pre-trained Spanish language models to exhibit their capabilities and limitations.As an example, we discover that for the case of discourse evaluation tasks, mBERT, a language model trained on multiple languages, usually provides a richer latent representation than models trained only with documents in Spanish. We hope our contribution will motivate a fairer, more comparable, and less cumbersome way to evaluate future Spanish language models.

show abstract

Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations

Cited by 4 publications

References 30 publications

Investigating Neural Fit Approaches for Sentence Embedding Model Paradigms

Investigating Neural Fit Approaches for Sentence Embedding Model Paradigms

A Memory Model for Question Answering from Streaming Data Supported by Rehearsal and Anticipation of Coreference Information

Evaluation Benchmarks for Spanish Sentence Representations

Contact Info

Product

Resources

About