Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2016
DOI: 10.18653/v1/p16-1089
|View full text |Cite
|
Sign up to set email alerts
|

Siamese CBOW: Optimizing Word Embeddings for Sentence Representations

Abstract: We present the Siamese Continuous Bag of Words (Siamese CBOW) model, a neural network for efficient estimation of highquality sentence embeddings. Averaging the embeddings of words in a sentence has proven to be a surprisingly successful and efficient way of obtaining sentence embeddings. However, word embeddings trained with the methods currently available are not optimized for the task of sentence representation, and, thus, likely to be suboptimal. Siamese CBOW handles this problem by training word embedding… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

3
110
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 178 publications
(117 citation statements)
references
References 19 publications
3
110
0
Order By: Relevance
“…Following the Tf-Idf weighting schema, another compositional way for building document representations has been introduced by [23], allowing to better fit with matching tasks. A more complex approach is inspired by neural language models [10,11]. Following the CBOW and the skip-gram frameworks [15] respectively, the Siamese CBOW model [10] and the Skip-thought [11] learn sentence representations by either predicting a sentence from its surrounding sentences or its context sentences from the encoded sentence.…”
Section: Traditional Neural Approaches For Learning Textmentioning
confidence: 99%
See 1 more Smart Citation
“…Following the Tf-Idf weighting schema, another compositional way for building document representations has been introduced by [23], allowing to better fit with matching tasks. A more complex approach is inspired by neural language models [10,11]. Following the CBOW and the skip-gram frameworks [15] respectively, the Siamese CBOW model [10] and the Skip-thought [11] learn sentence representations by either predicting a sentence from its surrounding sentences or its context sentences from the encoded sentence.…”
Section: Traditional Neural Approaches For Learning Textmentioning
confidence: 99%
“…A more complex approach is inspired by neural language models [10,11]. Following the CBOW and the skip-gram frameworks [15] respectively, the Siamese CBOW model [10] and the Skip-thought [11] learn sentence representations by either predicting a sentence from its surrounding sentences or its context sentences from the encoded sentence. As an extension of word2vec, the Paragraph-Vector model [12] jointly learns paragraph (or document) and word representations within the same embedding space.…”
Section: Traditional Neural Approaches For Learning Textmentioning
confidence: 99%
“…Recent years have seen neural networks being applied to all key parts of the typical modern IR pipeline, such core ranking algorithms [26,42,51], click models [9,10], knowledge graphs [8,35], text similarity [28,47], entity retrieval [52,53], language modeling [5], question answering [22,56], and dialogue systems [34,54].…”
Section: Motivationmentioning
confidence: 99%
“…In recent years, there have also been several studies that extend the proportion from word level to sentence, paragraph, or even document level, such as doc2vec (Mikolov et al, 2013), FastText (Bojanowski et al, 2017, and Siamese-CBOW (Kenter et al, 2016). Following the fruitful progress of these techniques of word and sentence embeddings, this paper presents a web-based information system, RiskFinder, that broadens the content analysis from the word level to sentence level for financial reports.…”
Section: Introductionmentioning
confidence: 99%
“…In addition to the 10-K corpus, we also construct a set of labeled financial sentences with respect to financial risk by involving 8 financial specialists including accountants and financial analysts to ensure the quality of the labeling. With the labeled sentences and the large collection of financial reports, we apply FastText (Bojanowski et al, 2017) and Siamese-CBOW (Kenter et al, 2016) to sentence-level textual analysis. Due to the superior performance of FastText, the system highlights high risk sentences in those reports via using FastText.…”
Section: Introductionmentioning
confidence: 99%