2019
DOI: 10.48550/arxiv.1909.01264
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the Downstream Performance of Compressed Word Embeddings

Abstract: Compressing word embeddings is important for deploying NLP models in memory-constrained settings. However, understanding what makes compressed embeddings perform well on downstream tasks is challenging-existing measures of compression quality often fail to distinguish between embeddings that perform well and those that do not. We thus propose the eigenspace overlap score as a new measure. We relate the eigenspace overlap score to downstream performance by developing generalization bounds for the compressed emb… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…Dictionary learning [20] and word embedding clustering [21] approaches have been proposed. An optimized method for uniform quantization of floating point numbers in the embedding matrix has been proposed recently [22]. To compress a model for low-memory inference, [23] used pruning and quantization for lowering the number of parameters.…”
Section: Related Workmentioning
confidence: 99%
“…Dictionary learning [20] and word embedding clustering [21] approaches have been proposed. An optimized method for uniform quantization of floating point numbers in the embedding matrix has been proposed recently [22]. To compress a model for low-memory inference, [23] used pruning and quantization for lowering the number of parameters.…”
Section: Related Workmentioning
confidence: 99%
“…Although these models have been shown to achieve state-of-the-art performance on most NLP tasks, they are notably expensive to train. To help combat this, as mentioned by May et al (2019), model compression techniques like data quantization (Gong et al, 2014), model pruning (Han et al, 2016), and * Equal contribution knowledge distillation (Sanh et al, 2019, Hinton et al, 2015 have been developed. However, at 768 dimensions, the embeddings themselves can be prohibitively large for some tasks and settings.…”
Section: Introductionmentioning
confidence: 99%