Encode, Tag, Realize: High-Precision Text Editing

Malmi, Eric; Krause, Sebastian; Rothe, Sascha; Mirylenka, Daniil; Severyn, Aliaksei

doi:10.18653/v1/d19-1510

Cited by 116 publications

(102 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Currently, the only sequence editing model to be applied to GEC is LaserTagger (Malmi et al, 2019). Similarly to the two previously cited works, LaserTagger learns to edit sentences by two different edit operations: KEEP and DELETE, along with pairing these operations with a limited phrase vocabulary consisting of tokens that are frequently changed between the source and target sequences.…”

Section: Sequence Editing Modelsmentioning

confidence: 99%

“…As described in Malmi et al (2019), a sequence editing model learns to generate a target sentence by applying a small set of edit operations to the source sentence. It works in three steps: (1) the input sentence is encoded into a hidden representation, (2) each token in the input sentence is assigned an edit tag, and (3) rules are applied to convert the output tags into tokens.…”

Section: Sequence Editing Modelmentioning

confidence: 99%

See 1 more Smart Citation

Heterogeneous Recycle Generation for Chinese Grammatical Error Correction

Hinson¹,

Huang²,

Chen³

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

Most recent works in the field of grammatical error correction (GEC) rely on neural machine translation-based models. Although these models boast impressive performance, they require a massive amount of data to properly train. Furthermore, NMT-based systems treat GEC purely as a translation task and overlook the editing aspect of it. In this work we propose a heterogeneous approach to Chinese GEC, composed of a NMT-based model, a sequence editing model, and a spell checker. Our methodology not only achieves a new state-of-the-art performance for Chinese GEC, but also does so without relying on data augmentation or GEC-specific architecture changes. We further experiment with all possible configurations of our system with respect to model composition order and number of rounds of correction. A detailed analysis of each model and their contributions to the correction process is performed by adapting the ERRANT scorer to be able to score Chinese sentences.

show abstract

Section: Sequence Editing Modelsmentioning

confidence: 99%

Section: Sequence Editing Modelmentioning

confidence: 99%

Heterogeneous Recycle Generation for Chinese Grammatical Error Correction

Hinson¹,

Huang²,

Chen³

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…Sentence simplification (Nisioi et al, 2017) aims at using techniques such as shortening the sentences to make a text more readable. On the other hand, style transfer is the task of making an utterance conform to a specific style such as formality (Logeswaran et al, 2018;Sennrich et al, 2016).…”

Section: Sentence Editing and Simplificationmentioning

confidence: 99%

Sound Natural: Content Rephrasing in Dialog Systems

Einolghozati

Gupta

Diedrick

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

We introduce a new task of rephrasing for a more natural virtual assistant. Currently, virtual assistants work in the paradigm of intentslot tagging and the slot values are directly passed as-is to the execution engine. However, this setup fails in some scenarios such as messaging when the query given by the user needs to be changed before repeating it or sending it to another user. For example, for queries like 'ask my wife if she can pick up the kids' or 'remind me to take my pills', we need to rephrase the content to 'can you pick up the kids' and 'take your pills'. In this paper, we study the problem of rephrasing with messaging as a use case and release a dataset of 3000 pairs of original query and rephrased query. We show that BART, a pre-trained transformers-based masked language model with auto-regressive decoding, is a strong baseline for the task, and show improvements by adding a copy-pointer and copy loss to it. We analyze different tradeoffs of BART-based and LSTM-based seq2seq models, and propose a distilled LSTM-based seq2seq as the best practical model.

show abstract

“…Tagging solves text editing in two steps instead. It firstly employs a seq2seq framework to produce tag sequences, and secondly, edits input texts according to the tag sequences (the "realization" step) (Malmi et al, 2019). Tagging assigns the tag KEEP for words that do not need to be changed so that it does not need to learn a copy mechanism.…”

Section: Related Workmentioning

confidence: 99%

“…Upon receiving the encoder's hidden states that comprise the source text information, the decoder of End2end directly decodes the hidden states and generates the completely edited target text sequence. But, the decoder of Tagging produces a sequence of editing operations, such as deletion and insertion, that is later applied to the source text to yield the edited text via a realization step (Malmi et al, 2019). The mechanisms of End2end and Tagging are illustrated in Figure 1.…”

Section: Introductionmentioning

confidence: 99%

Recurrent Inference in Text Editing

Shi

Zeng

Zhang³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

In neural text editing, prevalent sequence-tosequence based approaches directly map the unedited text either to the edited text or the editing operations, in which the performance is degraded by the limited source text encoding and long, varying decoding steps. To address this problem, we propose a new inference method, Recurrence, that iteratively performs editing actions, significantly narrowing the problem space. In each iteration, encoding the partially edited text, Recurrence decodes the latent representation, generates an action of short, fixed-length, and applies the action to complete a single edit. For a comprehensive comparison, we introduce three types of text editing tasks: Arithmetic Operators Restoration (AOR), Arithmetic Equation Simplification (AES), Arithmetic Equation Correction (AEC). Extensive experiments on these tasks with varying difficulties demonstrate that Recurrence achieves improvements over conventional inference methods.

show abstract

Encode, Tag, Realize: High-Precision Text Editing

Cited by 116 publications

References 43 publications

Heterogeneous Recycle Generation for Chinese Grammatical Error Correction

Heterogeneous Recycle Generation for Chinese Grammatical Error Correction

Sound Natural: Content Rephrasing in Dialog Systems

Recurrent Inference in Text Editing

Contact Info

Product

Resources

About