Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers) 2017
DOI: 10.18653/v1/p17-1184
|View full text |Cite
|
Sign up to set email alerts
|

From Characters to Words to in Between: Do We Capture Morphology?

Abstract: Words can be represented by composing the representations of subword units such as word segments, characters, and/or character n-grams. While such representations are effective and may capture the morphological regularities of words, they have not been systematically compared, and it is not understood how they interact with different morphological typologies. On a language modeling task, we present experiments that systematically vary (1) the basic unit of representation, (2) the composition of these represent… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

4
107
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 69 publications
(111 citation statements)
references
References 24 publications
(24 reference statements)
4
107
0
Order By: Relevance
“…With a few notable exceptions (Vania and Lopez, 2017;Heigold et al, 2017), there was no systematic investigation of the various modelling architectures. In our work we address the question of what linguistic lexical aspects are best encoded in each type of architecture, and their efficacy as part of a machine translation model when translating from morpho- …”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…With a few notable exceptions (Vania and Lopez, 2017;Heigold et al, 2017), there was no systematic investigation of the various modelling architectures. In our work we address the question of what linguistic lexical aspects are best encoded in each type of architecture, and their efficacy as part of a machine translation model when translating from morpho- …”
Section: Related Workmentioning
confidence: 99%
“…Recent studies are exploring representations at the subword level that can provide information even for rare and unseen words. Well-known examples are character and character-ngram-based embeddings (Sperr et al, 2013;Vania and Lopez, 2017), morphological embeddings (Luong et al, 2013; Botha and Blunsom, 2014; Cotterell and Schütze, 2015; Cao and Rei, 2016), or byte embeddings (Plank et al, 2016;Gillick et al, 2016). were the first to integrate character-based embeddings into a syntactic parser and compared the effect for different languages with different levels of morphological richness.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Recent studies are exploring representations at the subword level that can provide information even for rare and unseen words. Well-known examples are character and character-ngram-based embeddings (Sperr et al, 2013;dos Santos and Zadrozny, 2014;Ling et al, 2015;Vania and Lopez, 2017), morphological embeddings (Luong et al, 2013;Botha and Blunsom, 2014;Cotterell and Schütze, 2015;Cao and Rei, 2016), or byte embeddings (Plank et al, 2016;Gillick et al, 2016). Ballesteros et al (2015) were the first to integrate character-based embeddings into a syntactic parser and compared the effect for different languages with different levels of morphological richness.…”
Section: Introductionmentioning
confidence: 99%