Code-switched inspired losses for spoken dialog representations

Colombo, Pierre; Chapuis, Emile; Labeau, Matthieu; Clavel, Chloé

doi:10.18653/v1/2021.emnlp-main.656

“…For future work, we plan to study OOD in sequence labelling tasks (Witon* et al, 2018;Colombo* et al, 2020;Chapuis* et al, 2020a;Colombo et al, 2021a), sequence generation (Colombo* et al, 2019;Jalalzai* et al, 2020;Modi et al, 2020;Colombo et al, 2021e) and fair classification (Colombo et al, 2021d;Pichler et al, 2022) and multimodal scenario (Garcia* et al, 2019;Dinkar* et al, 2020) as well as automatic evaluation (Colombo et al, 2021c;Colombo, 2021a;Staerman et al, 2021b).…”

Section: G Futures Applicationsmentioning

confidence: 99%

Unsupervised Layer-wise Score Aggregation for Textual OOD Detection

Darrin¹,

Staerman²,

Câmara³

et al. 2023

Preprint

1

0

View full text Add to dashboard Cite

Out-of-distribution (OOD) detection is a rapidly growing field due to new robustness and security requirements driven by an increased number of AI-based systems. Existing OOD textual detectors often rely on anomaly scores (e.g., Mahalanobis distance) computed on the embedding output of the last layer of the encoder. In this work, we observe that OOD detection performance varies greatly depending on the task and layer output. More importantly, we show that the usual choice (the last layer) is rarely the best one for OOD detection and that far better results can be achieved provided that an oracle selects the best layer. To leverage this observation, we propose a data-driven, unsupervised method to combine layer-wise anomaly scores. In addition, we extend classical textual OOD benchmarks by including classification tasks with a greater number of classes (up to 77), which reflects more realistic settings. On this augmented benchmark, we show that the proposed post-aggregation methods achieve robust and consistent results comparable to using the best layer according to an oracle while removing manual feature selection altogether.

show abstract

“…Update ϕ, ψ using (1). As future work we plan to disentangled more complex labels such as dialog acts (Colombo et al, , 2021a, emotions (Witon et al, 2018) and linguistic phenomena such as disfluencies (Dinkar et al, 2020) and other spoken language phe-…”

Section: D4 Related Work General Algorithmmentioning

confidence: 99%

Learning Disentangled Textual Representations via Statistical Measures of Similarity

Colombo¹,

Staerman²,

Noiry³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Self Cite

View full text Add to dashboard Cite

When working with textual data, a natural application of disentangled representations is fair classification where the goal is to make predictions without being biased (or influenced) by sensitive attributes that may be present in the data (e.g., age, gender or race). Dominant approaches to disentangle a sensitive attribute from textual representations rely on learning simultaneously a penalization term that involves either an adversarial loss (e.g., a discriminator) or an information measure (e.g., mutual information). However, these methods require the training of a deep neural network with several parameter updates for each update of the representation model. As a matter of fact, the resulting nested optimization loop is both time consuming, adding complexity to the optimization dynamic, and requires a fine hyperparameter selection (e.g., learning rates, architecture). In this work, we introduce a family of regularizers for learning disentangled representations that do not require training. These regularizers are based on statistical measures of similarity between the conditional probability distributions with respect to the sensitive attributes. Our novel regularizers do not require additional training, are faster and do not involve additional tuning while achieving better results both when combined with pretrained and randomly initialized text encoders.

show abstract

“…Update φ, ψ using (1). As future work we plan to disentangled more complex labels such as dialog acts (Colombo et al, , 2021a, emotions (Witon et al, 2018) and linguistic phenomena such as disfluencies (Dinkar et al, 2020) and other spoken language phenomenon . Future research also include extending these losses to data augmentation (Dhole et al, 2021;Colombo et al, 2021e) and sentence generation (Colombo et al, 2021c,f) and study the trade-off using rankings (Colombo et al, 2022) or anomaly detection (Staerman et al, 2019(Staerman et al, , 2020(Staerman et al, , 2021b(Staerman et al, , 2022.…”

Section: D4 Related Work General Algorithmmentioning

confidence: 99%

Learning Disentangled Textual Representations via Statistical Measures of Similarity

Colombo¹,

Staerman²,

Noiry³

et al. 2022

Preprint

Self Cite

1

0

View full text Add to dashboard Cite

When working with textual data, a natural application of disentangled representations is fair classification where the goal is to make predictions without being biased (or influenced) by sensitive attributes that may be present in the data (e.g., age, gender or race). Dominant approaches to disentangle a sensitive attribute from textual representations rely on learning simultaneously a penalization term that involves either an adversarial loss (e.g., a discriminator) or an information measure (e.g., mutual information). However, these methods require the training of a deep neural network with several parameter updates for each update of the representation model. As a matter of fact, the resulting nested optimization loop is both time consuming, adding complexity to the optimization dynamic, and requires a fine hyperparameter selection (e.g., learning rates, architecture). In this work, we introduce a family of regularizers for learning disentangled representations that do not require training. These regularizers are based on statistical measures of similarity between the conditional probability distributions with respect to the sensitive attributes. Our novel regularizers do not require additional training, are faster and do not involve additional tuning while achieving better results both when combined with pretrained and randomly initialized text encoders.

show abstract

Code-switched inspired losses for spoken dialog representations

Cited by 3 publications

References 65 publications

Unsupervised Layer-wise Score Aggregation for Textual OOD Detection

Unsupervised Layer-wise Score Aggregation for Textual OOD Detection

Learning Disentangled Textual Representations via Statistical Measures of Similarity

Learning Disentangled Textual Representations via Statistical Measures of Similarity

Contact Info

Product

Resources

About