Proceedings of the First Workshop on Natural Language Interfaces 2020
DOI: 10.18653/v1/2020.nli-1.5
|View full text |Cite
|
Sign up to set email alerts
|

Neural Multi-task Text Normalization and Sanitization with Pointer-Generator

Abstract: Text normalization and sanitization are intrinsic components of Natural Language Inferences. In Information Retrieval or Dialogue Generation, normalization of user queries or utterances enhances linguistic understanding by translating non-canonical text to its canonical form, on which many state-of-the-art language models are trained. On the other hand, text sanitization removes sensitive information to guarantee user privacy and anonymity. Existing approaches to normalization and sanitization mainly rely on h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 20 publications
0
2
0
Order By: Relevance
“…Lexical normalisation is typically tackled as one of two formulations, either as a sequence-tosequence (seq2seq) (Muller et al, 2019;Nguyen and Cavallari, 2020) or token classification problem (van der Goot and van Noord, 2017;Stewart et al, 2018Stewart et al, , 2019b. Seq2seq structures the learning task similar to neural machine translation (NMT) (Bahdanau et al, 2014) whereby an encoder receives a sequence of noisy text, X = (x 1 , .…”
Section: Problem Formulationmentioning
confidence: 99%
See 1 more Smart Citation
“…Lexical normalisation is typically tackled as one of two formulations, either as a sequence-tosequence (seq2seq) (Muller et al, 2019;Nguyen and Cavallari, 2020) or token classification problem (van der Goot and van Noord, 2017;Stewart et al, 2018Stewart et al, , 2019b. Seq2seq structures the learning task similar to neural machine translation (NMT) (Bahdanau et al, 2014) whereby an encoder receives a sequence of noisy text, X = (x 1 , .…”
Section: Problem Formulationmentioning
confidence: 99%
“…More recently, attention has shifted towards neural techniques that i) contextually normalise tokens based on high-level classifications (Stewart et al, 2019b), ii) modify and fine-tune large pre-trained transformer based representations (Muller et al, 2019), or iii) perform joint normalisation and sanitisation (e.g. masking sensitive tokens) (Nguyen and Cavallari, 2020).…”
Section: Introductionmentioning
confidence: 99%