Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 1 2017
DOI: 10.18653/v1/e17-1040
|View full text |Cite
|
Sign up to set email alerts
|

Character-Word LSTM Language Models

Abstract: We present a Character-Word Long ShortTerm Memory Language Model which both reduces the perplexity with respect to a baseline word-level language model and reduces the number of parameters of the model. Character information can reveal structural (dis)similarities between words and can even be used when a word is out-of-vocabulary, thus improving the modeling of infrequent and unknown words. By concatenating word and character embeddings, we achieve up to 2.77% relative improvement on English compared to a bas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
36
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 42 publications
(36 citation statements)
references
References 22 publications
(28 reference statements)
0
36
0
Order By: Relevance
“…Recently there are several papers that propose single stage mechanisms [23,24,25,26,27,28,29,30]. Of the above, perhaps the closest to our work is Slim embedding [24], which is a special case of WEST.…”
Section: Introductionmentioning
confidence: 94%
“…Recently there are several papers that propose single stage mechanisms [23,24,25,26,27,28,29,30]. Of the above, perhaps the closest to our work is Slim embedding [24], which is a special case of WEST.…”
Section: Introductionmentioning
confidence: 94%
“…This probability is approximated by learning the conditional probability of each token given a fixed number of k-context tokens by using a neural network with parameters Θ. The tokens used for training can be of different granularities such as word [21], character [22], sub-word unit [23], or hybrid word-character [24]. The objective function of the LM is to maximize the sum of the logs of the conditional probabilities over a sequence of tokens:…”
Section: A Gpt-2 Modelmentioning
confidence: 99%
“…Language Models (LMs) have been dominant in literal representation tasks, and they can be divided into two categories which are statistical language models [19,9] and neural network language models [2,46,34].…”
Section: Literal Representation Techniquesmentioning
confidence: 99%
“…Neural network language models can be further divided into RNN-based LMs [31,30,41,46,34], cache-based LMs [38,13,18], and attention-based LMs [2,43,28]. Inspired by the first RNN-based LM [31,30], the work by Sundermeyer et al [41] leverages LSTM [16] to capture context dependences.…”
Section: Literal Representation Techniquesmentioning
confidence: 99%