A Systematic Comparison of English Noun Compound Representations

Shwartz, Vered

doi:10.18653/v1/w19-5111

Cited by 5 publications

(13 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Much like distributional similarity approaches for learning word representations (Mikolov et al, 2013a;Bojanowski et al, 2017;Pennington et al, 2014, inter alia), a semantic representation of MWEs can be trained using a distributional approach that treats MWEs as single tokens (Mikolov et al, 2013b). However, this approach cannot handle out of vocabulary (OOV) MWEs, and it is likely to suffer from sparsity (Shwartz, 2019), particularly as the MWEs grow in length.…”

Section: Related Workmentioning

confidence: 99%

“…Our proposed approach does not suffer from this observer effect, as we learn our compositional function indirectly (through Skip-Gram), without relying on a distributionally learned embedding for MWEs for training. Shwartz (2019) also avoid this reliance on the gold embedding of the multi-word, learning the function indirectly. The compositional function is used to encode the multiword and its paraphrase and is then trained to maximize the cosine similarity between the encodings.…”

Section: Related Workmentioning

confidence: 99%

“…The paraphrases are generated either using backtranslation (Wieting et al, 2017;Wieting and Gimpel, 2018), or by treating frequent joint co-occurrences as paraphrases. However, this approach is outperformed by much cheaper unsupervised approaches, such as the average of the constituents' emebddings (Shwartz, 2019). Furthermore, the backtranslation approach depends on an external system, adding complexity to the model and restricting the languages and domains of application.…”

Section: Related Workmentioning

confidence: 99%

“…Multi-word expressions (MWEs) are fundamental to language and, as such, having a robust semantic representation for MWEs is important for any natural language processing task that involves text understanding such as information extraction, or question answering (e.g., da Silva and Souza, 2012;Thurmair, 2018;Subramanian et al, 2018). While MWEs have received attention in recent years, leading to considerable progress in learning MWE representations (Mitchell and Lapata, 2010;Butnariu et al, 2010;Tratz, 2011;Hendrickx et al, 2013;Dima, 2016;Shwartz and Dagan, 2018;Shwartz, 2019), we argue that the proposed methods have limitations. First, some methods require concatenating words in specified MWEs, and treating the resulting MWE phrases as atomic units.…”

Section: Introductionmentioning

confidence: 99%

“…Following Shwartz (2019), for our experiments we used FastText embeddings (Bojanowski et al, 2017), which, overall, performed the best on the Tratz dataset (Tratz, 2011). The embeddings were trained on the English Wikipedia dump from January 2018 3 to facilitate comparison between this work and that of Shwartz (2019).…”

mentioning

confidence: 99%

See 4 more Smart Citations

An Unsupervised Method for Learning Representations of Multi-word Expressions for Semantic Classification

Vacareanu¹,

Valenzuela-Escárcega²,

Sharp³

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

This paper explores an unsupervised approach to learning a compositional representation function for multi-word expressions (MWEs), and evaluates it on the Tratz dataset, which associates two-word expressions with the semantic relation between the compound constituents (e.g. the label employer is associated with the noun compound government agency) (Tratz, 2011). The composition function is based on recurrent neural networks, and is trained using the Skip-Gram objective to predict the words in the context of MWEs. Thus our approach can naturally leverage large unlabeled text sources. Further, our method can make use of provided MWEs when available, but can also function as a completely unsupervised algorithm, using MWE boundaries predicted by a single, domain-agnostic part-of-speech pattern. With pre-defined MWE boundaries, our method outperforms the previous state-of-the-art performance on the coarse-grained evaluation of the Tratz dataset (Tratz, 2011), with an F1 score of 50.4%. The unsupervised version of our method approaches the performance of the supervised one, and even outperforms it in some configurations.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%