Dysarthria is a condition where people experience a reduction in speech intelligibility due to a neuromotor disorder. Previous works in dysarthric speech recognition have focused on accurate recognition of words encountered in training data. Due to the rarity of dysarthria in the general population, a relatively small amount of publicly-available training data exists for dysarthric speech. The number of unique words in these datasets is small, so ASR systems trained with existing dysarthric speech data are limited to recognition of those words. In this paper, we propose a data augmentation method using voice conversion that allows dysarthric ASR systems to accurately recognize words outside of the training set vocabulary. We demonstrate that a small amount of dysarthric speech data can be used to capture the relevant vocal characteristics of a speaker with dysarthria through a parallel voice conversion system. We show that it's possible to synthesize utterances of new words that were never recorded by speakers with dysarthria, and that these synthesized utterances can be used to train a dysarthric ASR system.
Flexible piezoelectric acoustic sensors (f-PAS) have attracted significant attention as a promising component for voice user interfaces (VUI) in the era of artificial intelligence of things (AIoT). The signal distortion issue of highly sensitive biomimetic f-PAS is one of the most challenging obstacle for real-life applications, due to the fundamental difference compared with the conventional microphones. Here, a noise-robust flexible piezoelectric acoustic sensor (NPAS) is demonstrated by designing the multi-resonant bands outside the noise dominant frequency range. Broad voice coverage up to 8 kHz is achieved by adopting an advanced piezoelectric membrane with the optimized polymer ratio. Deep learning-based speech processing of multi-channel NPAS is demonstrated to show the outstanding improvement in speaker recognition and speech enhancement compared to a commercial microphone. Finally, the NPAS independently identified the multi-user voices in a crowd condition, showing simultaneous speaker separation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.