Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLU

Brielen, Madureira,; Schlangen, David

doi:10.18653/v1/2020.emnlp-main.26

Cited by 10 publications

(23 citation statements)

References 45 publications

(44 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The recent live incremental systems fall short of the same accuracies achievable on pre-segmented transcripts, so there is a natural interest in using the best non-incremental sequence models and adapting them for incrementality. Madureira and Schlangen (2020) take up this effort in several other sequence tagging and classification tasks, showing how bidirectional encoders and Transformers can be modified to work incrementally. To reduce the impact of the partiality of the input, the models predict future content and wait for more rightward context.…”

Section: Related Workmentioning

confidence: 99%

“…Prophecy-based decoding For our other decoding strategies, we use a 'prophecy'-based approach to predicting future word sequences, following the task of open-ended language generation, which, given an input text passage as context, is to produce text that constitutes a cohesive continuation (Holtzman et al, 2019). Inspired by (Madureira and Schlangen, 2020), using the GPT-2 language model (Radford et al, 2019), we first give each word as a left context and create a continuation until the end of an utterance to create a hypothetical complete context that satisfies the requirements of the models' non-incremental structure.…”

Section: Modifying the Decoding Proceduresmentioning

confidence: 99%

See 1 more Smart Citation

Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental

Rohanian¹,

Hough²

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

While Transformer-based text classifiers pretrained on large volumes of text have yielded significant improvements on a wide range of computational linguistics tasks, their implementations have been unsuitable for live incremental processing thus far, operating only on the level of complete sentence inputs. We address the challenge of introducing methods for word-by-word left-to-right incremental processing to Transformers such as BERT, models without an intrinsic sense of linear order. We modify the training method and live decoding of non-incremental models to detect speech disfluencies with minimum latency and without pre-segmentation of dialogue acts. We experiment with several decoding methods to predict the rightward context of the word currently being processed using a GPT-2 language model and apply a BERT-based disfluency detector to sequences, including predicted words. We show our method of incrementalising Transformers maintains most of their high non-incremental performance while operating strictly incrementally. We also evaluate our models' incremental performance to establish the trade-off between incremental performance and final performance, using different prediction strategies. We apply our system to incremental speech recognition results as they arrive into a live system and achieve state-of-the-art results in this setting.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Modifying the Decoding Proceduresmentioning

confidence: 99%

Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental

Rohanian¹,

Hough²

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

show abstract

“…Zhang et al (2021) introduced an average embedding layer to avoid recalculation when using an incremental encoder, while exploiting right context through knowledge distillation. An investigation of the use of non-incremental encoders for incremental NLU in interactive systems was conducted by Madureira and Schlangen (2020). The authors employed BERT (Devlin et al, 2019) for sequence tagging and classification using restart-incrementality, a procedure with high computational cost.…”

Section: Related Workmentioning

confidence: 99%

“…LT+R+CM+D: similar to (4), but, during training, the output for the input token x t is obtained at time t + d, where d ∈ {1, 2} is the delay, following the approach in Turek et al (2020). There is evidence that additional right context improve the models' incremental performance (Baumann et al, 2011;Ma et al, 2019;Madureira and Schlangen, 2020), which results in a trade-off between providing timely output or waiting for more context to deliver more stable output.…”

Section: Modelsmentioning

confidence: 99%

“…One way to employ non-incremental models in incremental settings is resorting to an incremental interface, like in Beuck et al (2011), where a complete recomputation of the available partial input happens at each time step to deliver partial output. Madureira and Schlangen (2020) examined the output stability of non-incremental encoders in this restart-incremental fashion. While qualitatively feasible, this procedure is computationally costly, especially for long sequences, since it requires as many forward passes as the number of input tokens.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU

Kahardipraja¹,

Brielen²,

Schlangen³

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Self Cite

View full text Add to dashboard Cite

Incremental processing allows interactive systems to respond based on partial inputs, which is a desirable property e.g. in dialogue agents. The currently popular Transformer architecture inherently processes sequences as a whole, abstracting away the notion of time. Recent work attempts to apply Transformers incrementally via restart-incrementality by repeatedly feeding, to an unchanged model, increasingly longer input prefixes to produce partial outputs. However, this approach is computationally costly and does not scale efficiently for long sequences. In parallel, we witness efforts to make Transformers more efficient, e.g. the Linear Transformer (LT) with a recurrence mechanism. In this work, we examine the feasibility of LT for incremental NLU in English. Our results show that the recurrent LT model has better incremental performance and faster inference speed compared to the standard Transformer and LT with restartincrementality, at the cost of part of the nonincremental (full sequence) quality. We show that the performance drop can be mitigated by training the model to wait for right context before committing to an output and that training with input prefixes is beneficial for delivering correct partial outputs.

show abstract

Not So Fast, Classifier – Accuracy and Entropy Reduction in Incremental Intent Classification

Hrycyk¹,

Zarcone²,

Hahn³

2021

Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI

View full text Add to dashboard Cite

Incremental intent classification requires the assignment of intent labels to partial utterances. However, partial utterances do not necessarily contain enough information to be mapped to the intent class of their complete utterance (correctly and with a certain degree of confidence). Using the final interpretation as the ground truth to measure a classifier's accuracy during intent classification of partial utterances is thus problematic. We release in-CLINC, a dataset of partial and full utterances with human annotations of plausible intent labels for different portions of each utterance, as an upper (human) baseline for incremental intent classification. We analyse the incremental annotations and propose entropy reduction as a measure of human annotators' convergence on an interpretation (i.e. intent label). We argue that, when the annotators do not converge to one or a few possible interpretations and yet the classifier already identifies the final intent class early on, it is a sign of overfitting that can be ascribed to artefacts in the dataset.

show abstract

Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLU

Cited by 10 publications

References 45 publications

Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental

Best of Both Worlds: Making High Accuracy Non-incremental Transformer-based Disfluency Detection Incremental

Towards Incremental Transformers: An Empirical Analysis of Transformer Models for Incremental NLU

Not So Fast, Classifier – Accuracy and Entropy Reduction in Incremental Intent Classification

Contact Info

Product

Resources

About