Proceedings of the 24th International Conference on World Wide Web 2015
DOI: 10.1145/2736277.2741643
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical Neural Language Models for Joint Representation of Streaming Documents and their Content

Abstract: We consider the problem of learning distributed representations for documents in data streams. The documents are represented as low-dimensional vectors and are jointly learned with distributed vector representations of word tokens using a hierarchical framework with two embedded neural language models. In particular, we exploit the context of documents in streams and use one of the language models to model the document sequences, and the other to model word sequences within them. The models learn continuous ve… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
54
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 63 publications
(54 citation statements)
references
References 16 publications
0
54
0
Order By: Relevance
“…These powerful, efficient models have shown very promising results in capturing both semantic and syntactic relationships between words in large-scale text corpora, and obtained state-of-the-art results on many NLP tasks. Recently, the concept of embedding has been expanded to many applications, including sentences and paragraphs representation [11], summarization [21], questions answering [43], recommender systems [34] and so on.…”
Section: Embeddingmentioning
confidence: 99%
See 1 more Smart Citation
“…These powerful, efficient models have shown very promising results in capturing both semantic and syntactic relationships between words in large-scale text corpora, and obtained state-of-the-art results on many NLP tasks. Recently, the concept of embedding has been expanded to many applications, including sentences and paragraphs representation [11], summarization [21], questions answering [43], recommender systems [34] and so on.…”
Section: Embeddingmentioning
confidence: 99%
“…Thirdly, some slight differences in styles and genres of music pieces are also shown by the learned embeddings, which shows that the learned embeddings by MEM can effectively capture the accurate features of the corresponding music pieces. For example, as for the last four music pieces (13)(14)(15)(16), all of which are soundtracks for anime, and they are more similar to each other than the other pieces (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12) in Table 8. In addition, the former two pieces (13)(14) are more similar to each other than the latter two pieces (15)(16) in Table 8.…”
Section: Illustrations Of Selected Music Pieces' Embeddingsmentioning
confidence: 99%
“…The related work is largely focused on the notion of word and text representations (as in (Djuric et al, 2015a;Le and Mikolov, 2014;Mikolov et al, 2013a)), which improve previous works on modeling lexical semantics using vector space models (Mikolov et al, 2013a). More recently, the concept of embeddings has been extended beyond words to a number of text segments, including phrases (Mikolov et al, 2013b), sentences and paragraphs (Le and Mikolov, 2014) and entities (Yang et al, 2014).…”
Section: Distributional Representation Of Comments (C2v)mentioning
confidence: 99%
“…More recently, the concept of embeddings has been extended beyond words to a number of text segments, including phrases (Mikolov et al, 2013b), sentences and paragraphs (Le and Mikolov, 2014) and entities (Yang et al, 2014). In order to learn vector representation we develop a comment embeddings approach akin to Le and Mikolov (2014) which is different from the one used in Djuric et al (2015a) since our representation doesn't model the relationships between the comments (e.g., temporal). Moreover, given the similarity with a prior state-of-the-art approach (Djuric et al, 2015b), this method can also be used as a strong baseline.…”
Section: Distributional Representation Of Comments (C2v)mentioning
confidence: 99%
“…This requires the aid of dictionary and Chinese word segmentation technology, while the dictionary is field related and time varying, the Chinese word segmentation process is complex and the result accuracy is not high. In order to achieve high classification performance, document classification algorithms are adopted in supervised classification methods, such as decision tree, Naive Bayesian, KNN (K-nearest neighbor), SVM (Support vector machine), neural network (Djuric et al, 2015;Patel et al, 2013) and genetic algorithm (Revathi,2013).Because its classification algorithm often uses the supervised classification method, classification effect is highly dependent on the quality of artificial annotation corpus, and the document classification model transplantation is not high. In this paper, aiming at the problem of food safety document corpus, we improved the classification algorithm.…”
Section: Introductionmentioning
confidence: 99%