2008
DOI: 10.1109/tnn.2007.912312
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model

Abstract: Abstract. Previous work on statistical language modeling has shown that it is possible to train a feed-forward neural network to approximate probabilities over sequences of words, resulting in significant error reduction when compared to standard baseline models. However, in order to train the model on the maximum likelihood criterion, one has to make, for each example, as many network passes as there are words in the vocabulary. We introduce adaptive importance sampling as a way to accelerate training of the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
251
0
4

Year Published

2015
2015
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 323 publications
(347 citation statements)
references
References 16 publications
1
251
0
4
Order By: Relevance
“…(2).) To address this problem, we use the approach presented in (Jean et al, 2015), which is based on importance sampling (Bengio and Sénécal, 2008). During training, we choose a smaller vocabulary size τ and divide the training set into partitions, each of which contains approximately τ unique target words.…”
Section: Very Large Target Vocabulary Extensionmentioning
confidence: 99%
“…(2).) To address this problem, we use the approach presented in (Jean et al, 2015), which is based on importance sampling (Bengio and Sénécal, 2008). During training, we choose a smaller vocabulary size τ and divide the training set into partitions, each of which contains approximately τ unique target words.…”
Section: Very Large Target Vocabulary Extensionmentioning
confidence: 99%
“…Relation of w, c Representation of c Skip-gram [18] c predicts w one of c CBOW [18] c predicts w average Order c predicts w concatenation LBL [22] c predicts w compositionality NNLM [2] c predicts w compositionality C&W [3] scores w, c compositionality Table 1: A summary of the investigated models, including how they model the relationship between the target word w and its context c, and how the models use the embeddings of the context words to represent the context. still few works that offer fair comparisons among existing word embedding algorithms.…”
Section: Modelmentioning
confidence: 99%
“…In contrast, the Order model (Section 2.1.5) uses the concatenation of the context words' embeddings, which maintains the word order information. Furthermore, the LBL [22], NNLM [2] and C&W models add a hidden layer to the Order model. Thus, these models use the semantic compositionality [10] of the context words as the context representation.…”
Section: Modelmentioning
confidence: 99%
See 2 more Smart Citations