Yuuki Tachioka scite author profile

This paper describes our joint efforts to provide robust automatic speech recognition (ASR) for reverberated environments, such as in hands-free human-machine interaction. We investigate blind feature space de-reverberation and deep recurrent de-noising auto-encoders (DAE) in an early fusion scheme. Results on the 2014 REVERB Challenge development set indicate that the DAE front-end provides complementary performance gains to multi-condition training, feature transformations, and model adaptation. The proposed ASR system achieves word error rates of 17.62 % and 36.6 % on simulated and real data, which is a significant improvement over the Challenge baseline (25.16 and 47.2 %).

show abstract

Ensemble integration of calibrated speaker localization and statistical speech detection in domestic environments

Tachioka

Narita

Watanabe

et al. 2014

View full text Add to dashboard Cite

This paper describes speaker localization and speech detection techniques for domestic environments. In real environments, it is hard to localize speakers because reverberation causes discrepancy from the simple spherical wave assumption. We propose a template-based method that calibrates the localization errors included in conventional methods. In addition, we use statistical speech detection methods to deal with noises. However, in this challenge, there are five rooms and leaked utterances from other rooms must be rejected. This kind of rejection is hard to perform by only using speech detection results. To address this problem, we also propose a method that integrates speech localization and speech detection using a minimum cost criterion or a classifier-based strategy. The proposed method achieved an accuracy of 0.712 for speaker localization and an F value of 0.743 for speech detection on the development set compared with the baseline 0.559 and 0.570, and 0.666 and 0.706 on the test set compared with the baseline 0.517 and 0.602.

show abstract

Speech recognition performance estimation for clipped speech based on objective measures

Tachioka

Narita

Ishii

2014

Acoust. Sci. & Tech.

View full text Add to dashboard Cite

Discriminative method for recurrent neural network language models

Tachioka

Watanabe

2015

View full text Add to dashboard Cite

Effectiveness of dereverberation, feature transformation, discriminative training methods, and system combination approach for various reverberant environments

Tachioka

Narita

Watanabe

2015

EURASIP J. Adv. Signal Process.

View full text Add to dashboard Cite

The recently released REverberant Voice Enhancement and Recognition Benchmark (REVERB) challenge includes a reverberant automatic speech recognition (ASR) task. This paper describes our proposed system based on multi-channel speech enhancement preprocessing and state-of-the-art ASR techniques. For preprocessing, we propose a single-channel dereverberation method with reverberation time estimation, which is combined with multichannel beamforming that enhances direct sound compared with the reflected sound. In addition, this paper also focuses on state-of-the-art ASR techniques such as discriminative training of acoustic models including the Gaussian mixture model, subspace Gaussian mixture model, and deep neural networks, as well as various feature transformation techniques. Although, for the REVERB challenge, it is necessary to handle various acoustic environments, a single ASR system tends to be overly tuned for a specific environment, which degrades the performance in the mismatch environments. To overcome this mismatch problem with a single ASR system, we use a system combination approach using multiple ASR systems with different features and different model types because a combination of various systems that have different error patterns is beneficial. In particular, we use our discriminative training technique for system combination that achieves better generalization by making systems complementary with the modified discriminative criteria. Experiments show the effectiveness of these approaches, reaching 6.76 and 18.60 % word error rates on the REVERB simulated and real test sets. These are 68.8 and 61.5 % relative improvements over the baseline.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yuuki Tachioka

Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition

Ensemble integration of calibrated speaker localization and statistical speech detection in domestic environments

Speech recognition performance estimation for clipped speech based on objective measures

Discriminative method for recurrent neural network language models

Effectiveness of dereverberation, feature transformation, discriminative training methods, and system combination approach for various reverberant environments

Contact Info

Product

Resources

About