Benchmarking Commercial Intent Detection Services with Practice-Driven Evaluations

Qi, Haode; Pan, Lin; Sood, Atin; Shah, Abhishek; Kunc, Ladislav; Yu, Mo; Potdar, Saloni

doi:10.18653/v1/2021.naacl-industry.38

Cited by 9 publications

(16 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the three intent classification datasets, in addition to the original evaluation data, we also evaluate on a difficult subset of each test set described in (Qi et al 2021). The difficult subsets are constructed by comparing the TF-IDF vector of each test example to that of the training examples for a given intent.…”

Section: Contrastive Learningmentioning

confidence: 99%

“…We also experimented with generating our own difficult subsets in a similar manner using BERT-based sentence encoders 5 , and compare each test example with the mean-pooling of the training examples for that intent. Result shows that the TF-IDF method yields a more challenging subset, thus we report results on the original subsets from Qi et al (2021). The evaluation metric for all intent classification datasets is accuracy.…”

Section: Contrastive Learningmentioning

confidence: 99%

“…5 The specific model used is "paraphrase-mpnet-base-v2" (Song et al 2020) On the three intent classification datasets, we follow the hyperparameter settings in Qi et al (2021). We use a batch size of 32 and fine-tune for 5 epochs, and search over lr ∈ {0.00003, 0.00004, 0.00005}.…”

Section: Training Detailsmentioning

confidence: 99%

See 2 more Smart Citations

Improved Text Classification via Contrastive Adversarial Training

Pan¹,

Hang²,

Sil³

et al. 2022

AAAI

Self Cite

View full text Add to dashboard Cite

We propose a simple and general method to regularize the fine-tuning of Transformer-based encoders for text classification tasks. Specifically, during fine-tuning we generate adversarial examples by perturbing the word embedding matrix of the model and perform contrastive learning on clean and adversarial examples in order to teach the model to learn noise-invariant representations. By training on both clean and adversarial examples along with the additional contrastive objective, we observe consistent improvement over standard fine-tuning on clean examples. On several GLUE benchmark tasks, our fine-tuned Bert_Large model outperforms Bert_Large baseline by 1.7% on average, and our fine-tuned Roberta_Large improves over Roberta_Large baseline by 1.3%. We additionally validate our method in different domains using three intent classification datasets, where our fine-tuned Roberta_Large outperforms Roberta_Large baseline by 1-2% on average. For the challenging low-resource scenario, we train our system using half of the training data (per intent) in each of the three intent classification datasets, and achieve similar performance compared to the baseline trained with full training data.

show abstract

Section: Contrastive Learningmentioning

confidence: 99%

Section: Contrastive Learningmentioning

confidence: 99%

See 1 more Smart Citation

Improved Text Classification via Contrastive Adversarial Training

Pan¹,

Hang²,

Sil³

et al. 2022

AAAI

Self Cite

View full text Add to dashboard Cite

show abstract

“…Query-document pairs are concatenated and sent through Transformer-based encoders, an additional layer on top of the encoded representation is adopted to produce a relevance score of the document to the query, which is then used for ranking. Arora et al (2020) and Qi et al (2021) benchmark intent detection models on intent detection datasets such as CLINC150 (Larson et al, 2019) where sufficient training examples exist for each intent. On the other hand, our use case focuses on the scenarios where answer text is available but training examples are insufficient.…”

Section: Related Workmentioning

confidence: 99%

Fast and Light-Weight Answer Text Retrieval in Dialogue Systems

Wan¹,

Patel²,

Murdock³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Dialogue systems can benefit from being able to search through a corpus of text to find information relevant to user requests, especially when encountering a request for which no manually curated response is available. The state-of-the-art technology for neural dense retrieval or re-ranking involves deep learning models with hundreds of millions of parameters. However, it is difficult and expensive to get such models to operate at an industrial scale, especially for cloud services that often need to support a big number of individually customized dialogue systems, each with its own text corpus. We report our work on enabling advanced neural dense retrieval systems to operate effectively at scale on relatively inexpensive hardware. We compare with leading alternative industrial solutions and show that we can provide a solution that is effective, fast, and cost-efficient.

show abstract

“…We also create the fewshot version of these datasets to evaluate the models' performance on small datasets. Additionally, after observing the close accuracy results among the models, we follow Arora et al ( 2020) and Qi et al (2021) to create the TF*IDF and jaccard based difficult testing set to differentiate them better. 4 Overall, our benchmark generates about 1000 data points, including accuracy and training time in default, few-shot training, and difficult testing settings.…”

Section: Introductionmentioning

confidence: 99%

Proceedings of the Workshop on Multilingual Information Access (MIA)

2022

View full text Add to dashboard Cite

Stemming from the limited availability of datasets and textual resources for low-resource languages such as isiZulu, there is a significant need to be able to harness knowledge from pre-trained models to improve low resource machine translation. Moreover, a lack of techniques to handle the complexities of morphologically rich languages has compounded the unequal development of translation models, with many widely spoken African languages being left behind. This study explores the potential benefits of transfer learning in an English-isiZulu translation framework. The results indicate the value of transfer learning from closely related languages to enhance the performance of low-resource translation models, thus providing a key strategy for low-resource translation going forward. We gathered results from 8 different language corpora, including one multilingual corpus, and saw that isiXhosa-isiZulu outperformed all languages, with a BLEU score of 8.56 on the test set which was better from the multi-lingual corpora pre-trained model by 2.73. We also derived a new coefficient, Nasir's Geographical Distance Coefficient (NGDC) which provides an easy selection of languages for the pre-trained models. NGDC also indicated that isiXhosa should be selected as the language for the pre-trained model.

show abstract

Benchmarking Commercial Intent Detection Services with Practice-Driven Evaluations

Cited by 9 publications

References 10 publications

Improved Text Classification via Contrastive Adversarial Training

Improved Text Classification via Contrastive Adversarial Training

Fast and Light-Weight Answer Text Retrieval in Dialogue Systems

Proceedings of the Workshop on Multilingual Information Access (MIA)

Contact Info

Product

Resources

About