Answer Sentence Selection Using Local and Global Context in Transformer Models

Lauriola, Ivano; Moschitti, Alessandro

doi:10.1007/978-3-030-72113-8_20

Cited by 12 publications

(14 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the task of AS2, initial efforts embedded the question and candidates using CNNs (Severyn and Moschitti, 2015), weight aligned networks (Shen et al, 2017;Tran et al, 2018;Tay et al, 2018) and compare-aggregate architectures (Wang and Jiang, 2016;Bian et al, 2017;Yoon et al, 2019). Recent progress has stemmed from the application of transformer models for performing AS2 (Garg et al, 2020;Han et al, 2021;Lauriola and Moschitti, 2021). On the data front, small datasets like TrecQA (Wang et al, 2007) and WikiQA (Yang et al, 2015) have been supplemented with datasets such as ASNQ (Garg et al, 2020) having several million QA pairs.…”

Section: Related Workmentioning

confidence: 99%

Will this Question be Answered? Question Filtering via Answer Model Distillation for Efficient Question Answering

Garg¹,

Moschitti²

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Self Cite

View full text Add to dashboard Cite

In this paper we propose a novel approach towards improving the efficiency of Question Answering (QA) systems by filtering out questions that will not be answered by them. This is based on an interesting new finding: the answer confidence scores of state-of-the-art QA systems can be approximated well by models solely using the input question text. This enables preemptive filtering of questions that are not answered by the system due to their answer confidence scores being lower than the system threshold. Specifically, we learn Transformer-based question models by distilling Transformer-based answering models. Our experiments on three popular QA datasets and one industrial QA benchmark demonstrate the ability of our question models to approximate the Precision/Recall curves of the target QA system well. These question models, when used as filters, can effectively trade off lower computation cost of QA systems for lower Recall, e.g., reducing computation by ∼60%, while only losing ∼3−4% of Recall.

show abstract

Section: Related Workmentioning

confidence: 99%

Will this Question be Answered? Question Filtering via Answer Model Distillation for Efficient Question Answering

Garg¹,

Moschitti²

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Triplet loss (Hoffer and Ailon, 2015) has been used in few-shot classification methods. Although introduced for images, it has been successfully adapted in natural language processing (Wei et al, 2021;Lauriola and Moschitti, 2021). Triplet loss enables the network to distinguish been positive and negative examples of a class.…”

Section: Triplet Lossmentioning

confidence: 99%

Few-shot and Zero-shot Approaches to Legal Text Classification: A Case Study in the Financial Sector

Sarkar¹,

Ojha²,

Megaro³

et al. 2021

Proceedings of the Natural Legal Language Processing Workshop 2021

View full text Add to dashboard Cite

The application of predictive coding techniques to legal texts has the potential to greatly reduce the cost of legal review of documents, however, there is such a wide array of legal tasks and continuously evolving legislation that it is hard to construct sufficient training data to cover all cases. In this paper, we investigate few-shot and zero-shot approaches that require substantially less training data and introduce a triplet architecture, which for promissory statements produces performance close to that of a supervised system. This method allows predictive coding methods to be rapidly developed for new regulations and markets.

show abstract

“…Similarly to previous work (Tan et al, 2017;Lauriola and Moschitti, 2021), we define local context Loc k (C i,j ) for candidate C i,j as the sentences immediately preceding and succeeding each answer candidate within a window of 2k + 1 sentences, i.e., Loc k (C i,j ) = C i,j−k , . .…”

Section: Local Contextmentioning

confidence: 99%

“…Their approach, while interesting, is limited to entitiesbased context, and specific to Wikipedia and MR domain. For AS2, Lauriola and Moschitti (2021) proposed a model that uses local context as defined by the preceding and following sentences of the target answer. They also introduced a simple bagof-words representation of documents as global context, which did not show significant improvement over non-contextual AS2 models.…”

Section: Introductionmentioning

confidence: 99%

Modeling Context in Answer Sentence Selection Systems on a Latency Budget

Han¹,

Soldaini²,

Moschitti³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Answer Sentence Selection (AS2) is an efficient approach for the design of open-domain Question Answering (QA) systems. In order to achieve low latency, traditional AS2 models score question-answer pairs individually, ignoring any information from the document each potential answer was extracted from. In contrast, more computationally expensive models designed for machine reading comprehension tasks typically receive one or more passages as input, which often results in better accuracy. In this work, we present an approach to efficiently incorporate contextual information in AS2 models. For each answer candidate, we first use unsupervised similarity techniques to extract relevant sentences from its source document, which we then feed into an efficient transformer architecture fine-tuned for AS2. Our best approach, which leverages a multi-way attention architecture to efficiently encode context, improves 6% to 11% over noncontextual state of the art in AS2 with minimal impact on system latency. All experiments in this work were conducted in English. * Work was conducted while the author was an intern at Amazon Alexa.

show abstract

Answer Sentence Selection Using Local and Global Context in Transformer Models

Cited by 12 publications

References 18 publications

Will this Question be Answered? Question Filtering via Answer Model Distillation for Efficient Question Answering

Will this Question be Answered? Question Filtering via Answer Model Distillation for Efficient Question Answering

Few-shot and Zero-shot Approaches to Legal Text Classification: A Case Study in the Financial Sector

Modeling Context in Answer Sentence Selection Systems on a Latency Budget

Contact Info

Product

Resources

About