Building conversational systems in new domains and with added functionality requires resource-efficient models that work under lowdata regimes (i.e., in few-shot setups). Motivated by these requirements, we introduce intent detection methods backed by pretrained dual sentence encoders such as USE and Con-veRT. We demonstrate the usefulness and wide applicability of the proposed intent detectors, showing that: 1) they outperform intent detectors based on fine-tuning the full BERT-Large model or using BERT as a fixed black-box encoder on three diverse intent detection data sets; 2) the gains are especially pronounced in few-shot setups (i.e., with only 10 or 30 annotated examples per intent); 3) our intent detectors can be trained in a matter of minutes on a single CPU; and 4) they are stable across different hyperparameter settings. In hope of facilitating and democratizing research focused on intention detection, we release our code, as well as a new challenging single-domain intent detection dataset comprising 13,083 annotated examples over 77 intents.
Building conversational systems in new domains and with added functionality requires resource-efficient models that work under lowdata regimes (i.e., in few-shot setups). Motivated by these requirements, we introduce intent detection methods backed by pretrained dual sentence encoders such as USE and Con-veRT. We demonstrate the usefulness and wide applicability of the proposed intent detectors, showing that: 1) they outperform intent detectors based on fine-tuning the full BERT-Large model or using BERT as a fixed black-box encoder on three diverse intent detection data sets; 2) the gains are especially pronounced in few-shot setups (i.e., with only 10 or 30 annotated examples per intent); 3) our intent detectors can be trained in a matter of minutes on a single CPU; and 4) they are stable across different hyperparameter settings. In hope of facilitating and democratizing research focused on intention detection, we release our code, as well as a new challenging single-domain intent detection dataset comprising 13,083 annotated examples over 77 intents.
No abstract
Acyclic partial matchings on simplicial complexes play an important role in topological data analysis by facilitating efficient computation of (persistent) homology groups. Here we describe probabilistic properties of critical simplex counts for such matchings on clique complexes of Bernoulli random graphs. In order to accomplish this goal, we generalise the notion of a dissociated sum to a multivariate setting and prove an abstract multivariate central limit theorem using Stein's method. As a consequence of this general result, we are able to extract central limit theorems not only for critical simplex counts, but also for generalised U -statistics (and hence for clique counts in Bernoulli random graphs) as well as simplex counts in the link of a fixed simplex in a random clique complex.
There is plenty of room for improvement in credit risk prediction. Intuitively, similar customers should have similar credit risk. Capturing this similarity is often carried out using Euclidean distances between customer features and predicting credit default via logistic regression. Here we explore the use of topological data analysis for describing this similarity. In particular, persistent homology algorithms provide summaries of point clouds which relate to their topology. This approach has been shown to be useful in many applications but to the best of our knowledge, applying topological data analysis to prediction of credit risk is novel. We develop a pipeline which is based on the topological analysis of neighbourhoods of customers, with the neighbourhoods given through a geometric network construction. Using two data sets from the Lending Club we find a modest signal; the results have high variance, but they could be seen as indication that including such topological features could improve credit risk prediction when used as additional explanatory variable in a logistic regression.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.