Probing Multilingual Sentence Representations With X-Probe

Ravishankar, Vinit; Øvrelid, Lilja; Velldal, Erik

doi:10.18653/v1/w19-4318

Cited by 15 publications

(20 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…2 The benchmark covers five languages: English, French, German, Spanish and Russian, derived from Wikipedia. The task set comprises 9 probing tasks, summarized in Table 1, that address varieties of linguistic properties including surface, syntactic, and semantic information Ravishankar et al, 2019). Ravishankar et al (2019) used the datasets to evaluate different sentence encoders trained by mapping sentence representations to English.…”

Section: Experimental Setups and Resultsmentioning

confidence: 99%

“…The task set comprises 9 probing tasks, summarized in Table 1, that address varieties of linguistic properties including surface, syntactic, and semantic information Ravishankar et al, 2019). Ravishankar et al (2019) used the datasets to evaluate different sentence encoders trained by mapping sentence representations to English. Unlike Ravishankar et al ( 2019), we use the datasets to evaluate DCT embeddings for each language independently.…”

Section: Experimental Setups and Resultsmentioning

confidence: 99%

“…(Shao and Johnson, 2008). Finally, a fixed-length sentence vector of size Kd is generated by concatenating the first Ravishankar et al, 2019).…”

Section: Dct As Sentence Encodermentioning

confidence: 99%

“…Unlike(Almarwani et al, 2019), we note no further improvements with larger coefficients, thus, we only report the results of 1 ≤ K ≤ 4.2 Refer to(Ravishankar et al, 2019) for more details about the probing tasks.…”

mentioning

confidence: 99%

See 3 more Smart Citations

Discrete Cosine Transform as Universal Sentence Encoder

AlMarwani¹,

Diab²

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Modern sentence encoders are used to generate dense vector representations that capture the underlying linguistic characteristics for a sequence of words, including phrases, sentences, or paragraphs. These kinds of representations are ideal for training a classifier for an end task such as sentiment analysis, question answering and text classification. Different models have been proposed to efficiently generate general purpose sentence representations to be used in pretraining protocols. While averaging is the most commonly used efficient sentence encoder, Discrete Cosine Transform (DCT) was recently proposed as an alternative that captures the underlying syntactic characteristics of a given text without compromising practical efficiency compared to averaging. However, as with most other sentence encoders, the DCT sentence encoder was only evaluated in English. To this end, we utilize DCT encoder to generate universal sentence representation for different languages such as German, French, Spanish and Russian. The experimental results clearly show the superior effectiveness of DCT encoding in which consistent performance improvements are achieved over strong baselines on multiple standardized datasets.

show abstract

Section: Experimental Setups and Resultsmentioning

confidence: 99%

Section: Experimental Setups and Resultsmentioning

confidence: 99%

“…(Shao and Johnson, 2008). Finally, a fixed-length sentence vector of size Kd is generated by concatenating the first Ravishankar et al, 2019).…”

Section: Dct As Sentence Encodermentioning

confidence: 99%

mentioning

confidence: 99%

See 2 more Smart Citations

Discrete Cosine Transform as Universal Sentence Encoder

AlMarwani¹,

Diab²

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

show abstract

“…The majority of approaches for probing sentence embeddings target English, but recently some works have also addressed other languages such as Polish, Russian, or Spanish in a multiand cross-lingual setup (Krasnowska-Kieraś and Wróblewska, 2019;Ravishankar et al, 2019). Motivations for considering a multi-lingual analysis include knowing whether findings from English transfer to other languages and determining a universal set of probing tasks that suits multiple languages, e.g., with richer morphology and freer word order.…”

Section: Introductionmentioning

confidence: 99%

How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation

Eger

Daxenberger

Gurevych

2020

Proceedings of the 24th Conference on Computational Natural Language Learning

View full text Add to dashboard Cite

Sentence encoders map sentences to real valued vectors for use in downstream applications. To peek into these representations-e.g., to increase interpretability of their resultsprobing tasks have been designed which query them for linguistic knowledge. However, designing probing tasks for lesser-resourced languages is tricky, because these often lack largescale annotated data or (high-quality) dependency parsers as a prerequisite of probing task design in English. To investigate how to probe sentence embeddings in such cases, we investigate sensitivity of probing task results to structural design choices, conducting the first such large scale study. We show that design choices like size of the annotated probing dataset and type of classifier used for evaluation do (sometimes substantially) influence probing outcomes. We then probe embeddings in a multilingual setup with design choices that lie in a 'stable region', as we identify for English, and find that results on English do not transfer to other languages. Fairer and more comprehensive sentence-level probing evaluation should thus be carried out on multiple languages in the future.

show abstract

Less than Necessary or More than Sufficient: Validating Probing Dataset Size

Orlov,

Serikov

2024

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Probing Multilingual Sentence Representations With X-Probe

Cited by 15 publications

References 43 publications

Discrete Cosine Transform as Universal Sentence Encoder

Discrete Cosine Transform as Universal Sentence Encoder

How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation

Less than Necessary or More than Sufficient: Validating Probing Dataset Size

Contact Info

Product

Resources

About