We describe a new challenge aimed at discovering subword and word units from raw speech. This challenge is the followup to the Zero Resource Speech Challenge 2015. It aims at constructing systems that generalize across languages and adapt to new speakers. The design features and evaluation metrics of the challenge are presented and the results of seventeen models are discussed.Index Terms-zero resource speech technology, subword modeling, acoustic unit discovery, unsupervised term discovery
We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.
Theropithecus gelada, Papio anubis and Cercopithecus aethiops are commonly sympatric in Ethiopia. It is suggested that niche separation would be more marked among terrestrial open country species than among forest primates. The ecological relationships between these three species in an Ethiopian valley where they coexist are analysed. Quantitative data are presented on density and biomass, size of home ranges and day ranges, activity patterns, use of habitat, diet and feeding patterns and on interspecific interactions. These are compared across the species to determine to what extent ecological competition could occur and in what ways it is reduced. The data are discussed with reference to studies of forest primate communities where niche overlap has commonly been reported.
To acquire one’s native phonological system, language-specific phonological categories and relationships must be extracted from the input. The acquisition of the categories and relationships have each in their own right been the focus of intense research. However, it is remarkable that research on the acquisition of categories and the relations between them have proceeded, for the most part, independent of one another. We argue that this has led to the implicit view that phonological acquisition is a ‘two-stage’ process: phonetic categories are first acquired, and then subsequently mapped onto abstract phoneme categories. We present simulations that suggest two problems with this view: first, the learner might mistake the phoneme-level categories for phonetic-level categories and thus be unable to learn the relationships between phonetic-level categories; on the other hand, the learner might construct inaccurate phonetic-level representations that prevent it from finding regular relations among them. We suggest an alternative conception of the phonological acquisition problem that sidesteps this apparent inevitability, and acquires phonemic categories in a single stage. Using acoustic data from Inuktitut, we show that this model reliably converges on a set of phoneme-level categories and phonetic-level relations among subcategories, without making use of a lexicon.
We present the Zero Resource Speech Challenge 2020, which aims at learning speech representations from raw audio signals without any labels. It combines the data sets and metrics from two previous benchmarks (2017 and 2019) and features two tasks which tap into two levels of speech representation. The first task is to discover low bit-rate subword representations that optimize the quality of speech synthesis; the second one is to discover word-like units from unsegmented raw speech. We present the results of the twenty submitted models and discuss the implications of the main findings for unsupervised speech learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.