Character-level Representations Improve DRS-based Semantic Parsing Even in the Age of BERT

Noord, Rik van; Toral, Antonio; Bos, Johan

doi:10.18653/v1/2020.emnlp-main.371

Cited by 12 publications

(14 citation statements)

References 75 publications

(84 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…CharacterBERT (Boukkouri et al, 2020) ported this technique to BERT (Devlin et al, 2019), augmenting its existing WordPiece-tokenized input. Consistent with previous observations that feeding characters into a transformer stack comes with a huge computational cost while not improving over tokenization-based approaches (Al-Rfou et al, 2019), a BERT model fine-tuned for semantic parsing achieved gains only when characters complemented subwords (van Noord et al, 2020).…”

Section: Character-level Modelssupporting

confidence: 84%

CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

Clark,

Garrette,

Turc

et al. 2021

Preprint

View full text Add to dashboard Cite

Pipelined NLP systems have largely been superseded by end-to-end neural modeling, yet nearly all commonly-used models still require an explicit tokenization step. While recent tokenization approaches based on data-derived subword lexicons are less brittle than manually engineered tokenizers, these techniques are not equally suited to all languages, and the use of any fixed vocabulary may limit a model's ability to adapt. In this paper, we present CANINE, a neural encoder that operates directly on character sequences-without explicit tokenization or vocabulary-and a pre-training strategy that operates either directly on characters or optionally uses subwords as a soft inductive bias. To use its finer-grained input effectively and efficiently, CANINE combines downsampling, which reduces the input sequence length, with a deep transformer stack, which encodes context. CANINE outperforms a comparable mBERT model by 2.8 F1 on TYDI QA, a challenging multilingual benchmark, despite having 28% fewer model parameters.

show abstract

Section: Character-level Modelssupporting

confidence: 84%

CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

Clark,

Garrette,

Turc

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Neural Architecture We use a recurrent sequence-to-sequence neural network with two bi-directional LSTM layers (Hochreiter and Schmidhuber, 1997) (Vaswani et al, 2017), as implemented in the same framework. However, similar to van Noord et al (2020), none of our experiments reached the performance of the bi-LSTM model. We will therefore only show results of the bi-LSTM model in this paper.…”

Section: Input Representation Typesmentioning

confidence: 44%

“…Several data-driven methods based on neural networks have been proposed for DRS parsing (van Noord et al, 2018bLiu et al, 2019a;Evang, 2019;Fancellu et al, 2019;Fu et al, 2020;van Noord et al, 2020). These approaches frame semantic parsing as a sequence transformation problem and map the target meaning representation to string format.…”

Section: Introductionmentioning

confidence: 99%

Input Representations for Parsing Discourse Representation Structures: Comparing English with Chinese

Wang¹,

Noord²,

Bisazza³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

Self Cite

View full text Add to dashboard Cite

Neural semantic parsers have obtained acceptable results in the context of parsing DRSs (Discourse Representation Structures). In particular models with character sequences as input showed remarkable performance for English. But how does this approach perform on languages with a different writing system, like Chinese, a language with a large vocabulary of characters? Does rule-based tokenisation of the input help, and which granularity is preferred: characters, or words? The results are promising. Even with DRSs based on English, good results for Chinese are obtained. Tokenisation offers a small advantage for English, but not for Chinese. Overall, characters are preferred as input, both for English and Chinese.

show abstract

“…We would like to thank Anouck Braggaar, Max Müller-Eberstein and Kristian Nørgaard Jensen for testing development versions. Furthermore, we thank Rik van Noord for his participation in the video, and providing an early use-case for MACHAMP (van Noord et al, 2020). This research was supported by an Amazon Research Award, an STSM in the Multi3Generation COST action (CA18231), a visit supported by COSBI, grant 9063-00077B (Danmarks Frie Forskningsfond), and Nvidia corporation for sponsoring Titan GPUs.…”

Section: Acknowledgmentsmentioning

confidence: 99%

Massive Choice, Ample Tasks (MaChAmp): A Toolkit for Multi-task Learning in NLP

Goot

Üstün

Ramponi

et al. 2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrati

View full text Add to dashboard Cite

Transfer learning, particularly approaches that combine multi-task learning with pre-trained contextualized embeddings and fine-tuning, have advanced the field of Natural Language Processing tremendously in recent years. In this paper we present MACHAMP, a toolkit for easy fine-tuning of contextualized embeddings in multi-task settings. The benefits of MACHAMP are its flexible configuration options, and the support of a variety of natural language processing tasks in a uniform toolkit, from text classification and sequence labeling to dependency parsing, masked language modeling, and text generation. 1

show abstract

Character-level Representations Improve DRS-based Semantic Parsing Even in the Age of BERT

Cited by 12 publications

References 75 publications

CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

Input Representations for Parsing Discourse Representation Structures: Comparing English with Chinese

Massive Choice, Ample Tasks (MaChAmp): A Toolkit for Multi-task Learning in NLP

Contact Info

Product

Resources

About