We describe a new challenge aimed at discovering subword and word units from raw speech. This challenge is the followup to the Zero Resource Speech Challenge 2015. It aims at constructing systems that generalize across languages and adapt to new speakers. The design features and evaluation metrics of the challenge are presented and the results of seventeen models are discussed.Index Terms-zero resource speech technology, subword modeling, acoustic unit discovery, unsupervised term discovery
We present the Zero Resource Speech Challenge 2019, which proposes to build a speech synthesizer without any text or phonetic labels: hence, TTS without T (text-to-speech without text). We provide raw audio for a target voice in an unknown language (the Voice dataset), but no alignment, text or labels. Participants must discover subword units in an unsupervised way (using the Unit Discovery dataset) and align them to the voice recordings in a way that works best for the purpose of synthesizing novel utterances from novel speakers, similar to the target speaker's voice. We describe the metrics used for evaluation, a baseline system consisting of unsupervised subword unit discovery plus a standard TTS system, and a topline TTS using gold phoneme transcriptions. We present an overview of the 19 submitted systems from 10 teams and discuss the main results.
Unsupervised spoken term discovery is the task of finding recurrent acoustic patterns in speech without any annotations. Current approaches consists of two steps: (1) discovering similar patterns in speech, and (2) partitioning those pairs of acoustic tokens using graph clustering methods. We propose a new approach for the first step. Previous systems used various approximation algorithms to make the search tractable on large amounts of data. Our approach is based on an optimized k-nearest neighbours (KNN) search coupled with a fixed word embedding algorithm. The results show that the KNN algorithm is robust across languages, consistently outperforms the DTW-based baseline, and is competitive with current state-of-the-art spoken term discovery systems.
This paper consists in the design and implementation of a simple conditioning circuit to optimize the electronic nose performance, where a temperature modulation method was applied to the heating resistor, in order to study the sensor’s response and determine whether they are able to make the discrimination when are exposed to different Volatile Organic Compounds (VOC’s). This study was based on determining the efficiency of the gas sensors to be used in order to perform an Electronic Nose, improving the sensitivity, selectivity and repeatability of the measuring system and selecting the type of modulation (e.g. Pulse Width Modulation) for the analytes detection (i.e, Moscatel wine samples (2% of Alcohol) and Ethyl-Alcohol (70%)). The results demonstrated that using temperature modulation technique to the heater of sensors, it is possible to achieve a good discrimination of VOC's in fast and easy form, through a chemical sensors array. A discrimination model based on Principal Component Analysis (PCA) was implemented to each sensor, and data responses obtained gave a variance of 94.5% and 100% accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.