Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 2 2017
DOI: 10.18653/v1/e17-2081
|View full text |Cite
|
Sign up to set email alerts
|

Efficient, Compositional, Order-sensitive n-gram Embeddings

Abstract: We propose ECO: a new way to generate embeddings for phrases that is Efficient, Compositional, and Order-sensitive. Our method creates decompositional embeddings for words offline and combines them to create new embeddings for phrases in real time. Unlike other approaches, ECO can create embeddings for phrases not seen during training. We evaluate ECO on supervised and unsupervised tasks and demonstrate that creating phrase embeddings that are sensitive to word order can help downstream tasks.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(18 citation statements)
references
References 11 publications
0
18
0
Order By: Relevance
“…More practically, the Contextual Rare Words (CRW) dataset we provide will support research on few-shot learning of word embeddings. Both in this area and for n-grams there is great scope for combining our approach with compositional approaches (Bojanowski et al, 2016;Poliak et al, 2017) that can handle settings such as zero-shot learning. More work is needed to understand the usefulness of our method for representing (potentially cross-lingual) entities such as synsets, whose embeddings have found use in enhancing WordNet and related knowledge bases (Camacho-Collados et al, 2016;Khodak et al, 2017).…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…More practically, the Contextual Rare Words (CRW) dataset we provide will support research on few-shot learning of word embeddings. Both in this area and for n-grams there is great scope for combining our approach with compositional approaches (Bojanowski et al, 2016;Poliak et al, 2017) that can handle settings such as zero-shot learning. More work is needed to understand the usefulness of our method for representing (potentially cross-lingual) entities such as synsets, whose embeddings have found use in enhancing WordNet and related knowledge bases (Camacho-Collados et al, 2016;Khodak et al, 2017).…”
Section: Resultsmentioning
confidence: 99%
“…Compositional approaches, such as sums and products of unigram vectors, are often used and work well on some evaluations, but are often order-insensitive or very high-dimensional (Mitchell and Lapata, 2010). Recent work by Poliak et al (2017) works around this while staying compositional; however, as we will see their approach does not seem to capture a bigram's meaning much better than the sum of its word vectors. n-grams embeddings have also gained interest for low-dimensional document representation schemes (Hill et al, 2016;Pagliardini et al, 2018;Arora et al, 2018a), largely due to the success of their sparse high-dimensional Bag-of-n-Grams (BonG) counterparts (Wang and Manning, 2012).…”
Section: N-gram Embeddings For Classificationmentioning
confidence: 99%
See 1 more Smart Citation
“…11 For the datasets with train-test partitions, the sizes of the test sets are the following: 7,532 for 20News; 12,733 for Ohsumed; 25,000 for IMDb; and 1,000 for RTC. 12 For future work it would be interesting to explore more complex methods to learn embeddings for multiword expressions (Yin and Schütze, 2014;Poliak et al, 2017). 13 Computed by averaging accuracy of two different runs.…”
Section: Methodsmentioning
confidence: 99%
“…In ECO embeddings, vector representation of neighbouring words (both occurring before and after) are averaged to obtain the numeric representation of the current word. We used the pre-trained word vectors from Wikipedia dump with dimensionality ranging from 100 to 700 as provided by Poliak et al (2017). 3 to generate document embeddings…”
Section: Embedding Representationsmentioning
confidence: 99%