Efficient, Compositional, Order-sensitive n-gram Embeddings

Poliak, Adam; Rastogi, Pushpendre; Martin, Maite; Durme, Benjamin Van

doi:10.18653/v1/e17-2081

Cited by 14 publications

(18 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More practically, the Contextual Rare Words (CRW) dataset we provide will support research on few-shot learning of word embeddings. Both in this area and for n-grams there is great scope for combining our approach with compositional approaches (Bojanowski et al, 2016;Poliak et al, 2017) that can handle settings such as zero-shot learning. More work is needed to understand the usefulness of our method for representing (potentially cross-lingual) entities such as synsets, whose embeddings have found use in enhancing WordNet and related knowledge bases (Camacho-Collados et al, 2016;Khodak et al, 2017).…”

Section: Resultsmentioning

confidence: 99%

“…Compositional approaches, such as sums and products of unigram vectors, are often used and work well on some evaluations, but are often order-insensitive or very high-dimensional (Mitchell and Lapata, 2010). Recent work by Poliak et al (2017) works around this while staying compositional; however, as we will see their approach does not seem to capture a bigram's meaning much better than the sum of its word vectors. n-grams embeddings have also gained interest for low-dimensional document representation schemes (Hill et al, 2016;Pagliardini et al, 2018;Arora et al, 2018a), largely due to the success of their sparse high-dimensional Bag-of-n-Grams (BonG) counterparts (Wang and Manning, 2012).…”

Section: N-gram Embeddings For Classificationmentioning

confidence: 99%

“…However, the performance of both word embeddings and their extensions is known to degrade in small corpus settings or when embedding sparse, low-frequency features (Lazaridou et al, 2017). Attempts to address these issues often involve task-specific approaches (Rothe and Schütze, 2015;Iacobacci et al, 2015;Pagliardini et al, 2018) or extensively tuning existing architectures such as skip-gram (Poliak et al, 2017;Herbelot and Baroni, 2017).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Khodak¹,

Saunshi²,

Liang³

et al. 2018

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

110

View full text Add to dashboard Cite

Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features. This paper introducesà la carte embedding, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings. Our method relies mainly on a linear transformation that is efficiently learnable using pretrained word vectors and linear regression. This transform is applicable "on the fly" in the future when a new text feature or rare word is encountered, even if only a single usage example is available. We introduce a new dataset showing how theà la carte method requires fewer examples of words in context to learn high-quality embeddings and we obtain state-of-the-art results on a nonce task and some unsupervised document classification tasks.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: N-gram Embeddings For Classificationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Khodak¹,

Saunshi²,

Liang³

et al. 2018

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

110

View full text Add to dashboard Cite

show abstract

“…11 For the datasets with train-test partitions, the sizes of the test sets are the following: 7,532 for 20News; 12,733 for Ohsumed; 25,000 for IMDb; and 1,000 for RTC. 12 For future work it would be interesting to explore more complex methods to learn embeddings for multiword expressions (Yin and Schütze, 2014;Poliak et al, 2017). 13 Computed by averaging accuracy of two different runs.…”

Section: Methodsmentioning

confidence: 99%

On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis

Camacho-Collados¹,

Pilehvar²

2018

Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP

View full text Add to dashboard Cite

Text preprocessing is often the first step in the pipeline of a Natural Language Processing (NLP) system, with potential impact in its final performance. Despite its importance, text preprocessing has not received much attention in the deep learning literature. In this paper we investigate the impact of simple text preprocessing decisions (particularly tokenizing, lemmatizing, lowercasing and multiword grouping) on the performance of a standard neural text classifier. We perform an extensive evaluation on standard benchmarks from text categorization and sentiment analysis. While our experiments show that a simple tokenization of input text is generally adequate, they also highlight significant degrees of variability across preprocessing techniques. This reveals the importance of paying attention to this usually-overlooked step in the pipeline, particularly when comparing different models. Finally, our evaluation provides insights into the best preprocessing practices for training word embeddings.2 Not only the preprocessing of the corpus may play an important role but also its nature, domain, etc. Levy et al. (2015) also showed how small hyperparameter variations may have an impact on the performance of word embeddings. However, these considerations remain out of the scope of this paper.

show abstract

“…In ECO embeddings, vector representation of neighbouring words (both occurring before and after) are averaged to obtain the numeric representation of the current word. We used the pre-trained word vectors from Wikipedia dump with dimensionality ranging from 100 to 700 as provided by Poliak et al (2017). 3 to generate document embeddings…”

Section: Embedding Representationsmentioning

confidence: 99%

A study of N-gram and Embedding Representations for Native Language Identification

Vajjala¹,

Banerjee²

2017

Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

View full text Add to dashboard Cite

We report on our experiments with Ngram and embedding based feature representations for Native Language Identification (NLI) as a part of the NLI Shared Task 2017 (team name: NLI-ISU). Our best performing system on the test set for written essays had a macro F1 of 0.8264 and was based on word uni, bi and trigram features. We explored n-grams covering word, character, POS and word-POS mixed representations for this task. For embedding based feature representations, we employed both word and document embeddings. We had a relatively poor performance with all embedding representations compared to n-grams, which could be because of the fact that embeddings capture semantic similarities whereas L1 differences are more stylistic in nature.

show abstract

Efficient, Compositional, Order-sensitive n-gram Embeddings

Cited by 14 publications

References 11 publications

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis

A study of N-gram and Embedding Representations for Native Language Identification

Contact Info

Product

Resources

About