The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing 2016
DOI: 10.18653/v1/d16-1209
|View full text |Cite
|
Sign up to set email alerts
|

Gated Word-Character Recurrent Language Model

Abstract: We introduce a recurrent neural network language model (RNN-LM) with long shortterm memory (LSTM) units that utilizes both character-level and word-level inputs. Our model has a gate that adaptively finds the optimal mixture of the character-level and wordlevel inputs. The gate creates the final vector representation of a word by combining two distinct representations of the word. The character-level inputs are converted into vector representations of words using a bidirectional LSTM. The word-level inputs are… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
85
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 77 publications
(88 citation statements)
references
References 14 publications
2
85
0
Order By: Relevance
“…Most of the word-character hybrid models focus on input representation rather than generation. Usually, their representations are concatenated, or composition functions are learned Miyamoto and Cho, 2016). Even though they use word information to the input, the decoding process of their models is still in the character-level.…”
Section: Related Workmentioning
confidence: 99%
“…Most of the word-character hybrid models focus on input representation rather than generation. Usually, their representations are concatenated, or composition functions are learned Miyamoto and Cho, 2016). Even though they use word information to the input, the decoding process of their models is still in the character-level.…”
Section: Related Workmentioning
confidence: 99%
“…In this work, we will use a mixture model over M different models for generating words in place of the single softmax over words (Miyamoto and Cho, 2016;Neubig and Dyer, 2016):…”
Section: Word Generation Mixture Modelmentioning
confidence: 99%
“…Dos Santos and Zadrozny (2014) join word and character representations in a deep neural network for part-of-speech tagging. Finally, Miyamoto and Cho (2016) describe a LM that is related to our model, although their character-level embedding is generated by a bidirectional LSTM and we do not use a gate to determine how much of the word and how much of the character embedding is used. However, they only compare to a simple baseline model of 2 LSTM layers of each 200 hidden units without dropout, resulting in a higher baseline perplexity (as mentioned in section 4.3, our CW model also achieves larger improvements than reported in this paper with respect to that baseline).…”
Section: Related Workmentioning
confidence: 99%
“…Miyamoto and Cho (2016) only report results for a small model that is trained without dropout, resulting in a baseline perplexity of 115.65. If we train our small model without dropout we get a comparable baseline perplexity (116.33) and a character-word perplexity of 110.54 (compare to 109.05 reported by Miyamoto and Cho (2016)). It remains to be seen whether their model performs equally well compared to better baselines.…”
Section: Englishmentioning
confidence: 99%
See 1 more Smart Citation