Task-oriented dialog systems need to know when a query falls outside their range of supported intents, but current text classification corpora only define label sets that cover every example. We introduce a new dataset that includes queries that are out-of-scopei.e., queries that do not fall into any of the system's supported intents. This poses a new challenge because models cannot assume that every query at inference time belongs to a system-supported intent class. Our dataset also covers 150 intent classes over 10 domains, capturing the breadth that a production taskoriented agent must handle. We evaluate a range of benchmark classifiers on our dataset along with several different out-of-scope identification schemes. We find that while the classifiers perform well on in-scope intent classification, they struggle to identify out-of-scope queries. Our dataset and evaluation fill an important gap in the field, offering a way of more rigorously and realistically benchmarking text classification in task-driven dialog systems.
To be informative, an evaluation must measure how well systems generalize to realistic unseen data. We identify limitations of and propose improvements to current evaluations of text-to-SQL systems. First, we compare human-generated and automatically generated questions, characterizing properties of queries necessary for real-world applications. To facilitate evaluation on multiple datasets, we release standardized and improved versions of seven existing datasets and one new textto-SQL dataset. Second, we show that the current division of data into training and test sets measures robustness to variations in the way questions are asked, but only partially tests how well systems generalize to new queries; therefore, we propose a complementary dataset split for evaluation of future work. Finally, we demonstrate how the common practice of anonymizing variables during evaluation removes an important challenge of the task. Our observations highlight key difficulties, and our methodology enables effective measurement of future development.
We identify the pattern of microscopic dynamical relaxation for a two-dimensional glass-forming liquid. On short time scales, bursts of irreversible particle motion, called cage jumps, aggregate into clusters. On larger time scales, clusters aggregate both spatially and temporally into avalanches. This propagation of mobility takes place along the soft regions of the systems, which have been identified by computing isoconfigurational Debye-Waller maps. Our results characterize the way in which dynamical heterogeneity evolves in moderately supercooled liquids and reveal that it is astonishingly similar to the one found for dense glassy granular media.
Despite the recent popularity of word embedding methods, there is only a small body of work exploring the limitations of these representations. In this paper, we consider one aspect of embedding spaces, namely their stability. We show that even relatively high frequency words (100-200 occurrences) are often unstable. We provide empirical evidence for how various factors contribute to the stability of word embeddings, and we analyze the effects of stability on downstream tasks.
Disentangling conversations mixed together in a single stream of messages is a difficult task, made harder by the lack of large manually annotated datasets. We created a new dataset of 77,563 messages manually annotated with reply-structure graphs that both disentangle conversations and define internal conversation structure. Our dataset is 16 times larger than all previously released datasets combined, the first to include adjudication of annotation disagreements, and the first to include context. We use our data to re-examine prior work, in particular, finding that 80% of conversations in a widely used dialogue corpus are either missing messages or contain extra messages. Our manually-annotated data presents an opportunity to develop robust data-driven methods for conversation disentanglement, which will help advance dialogue research.
This paper considers the homogeneous packing of binary hard spheres in an equimolar stoichiometry, and postulates the densest packing at each sphere size ratio. Monte Carlo simulated annealing optimizations are seeded with all known atomic inorganic crystal structures, and the search is performed within the degrees of freedom associated with each homogeneous AB structure type. Structures isopointal to the FeB structure type are found to have the highest packing fraction at all sphere size ratios. The optimized structures match or improve on the best previously demonstrated packings of this type, and show that compound structures can pack more densely than segregated close-packed structures at all radius ratios less than 0.62.
We present a catalogue of high-velocity clouds (HVCs) from the Galactic All Sky Survey (GASS) of southern-sky neutral hydrogen, which has 57 mK sensitivity and 1 km s −1 velocity resolution and was obtained with the Parkes Telescope. Our catalogue has been derived from the stray-radiation corrected second release of GASS. We describe the data and our method of identifying HVCs and analyse the overall properties of the GASS population. We catalogue a total of 1693 HVCs at declinations < 0 • , including 1111 positive velocity HVCs and 582 negative velocity HVCs. Our catalogue also includes 295 anomalous velocity clouds (AVCs). The cloud line-widths of our HVC population have a median FWHM of ∼19 km s −1 , which is lower than found in previous surveys. The completeness of our catalogue is above 95% based on comparison with the HIPASS catalogue of HVCs, upon which we improve with an order of magnitude in spectral resolution. We find 758 new HVCs and AVCs with no HIPASS counterpart. The GASS catalogue will shed an unprecedented light on the distribution and kinematic structure of southern-sky HVCs, as well as delve further into the cloud populations that make up the anomalous velocity gas of the Milky Way.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.