June-Woo Kim scite author profile

June-Woo Kim

5Publications

8Citation Statements Received

22Citation Statements Given

How they've been cited

How they cite others

118

Affiliations

Kyungpook National University

Publications

Order By: Most citations

Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments

2021

View full text Add to dashboard Cite

We address a low-performance problem of the elderly in automatic speech recognition (ASR) through feature adaptation agnostic to the ASR. Most of the datasets for speech recognition models consist of datasets collected from adult speakers. Consequently, the majority of commercial speech recognition systems typically tend to perform well on adult speakers. In other words, the limited diversity of speakers in the training datasets yields unreliable performance for minority (e.g., elderly) speakers due to the infeasible acquisition of training data. In response, this paper suggests a neural network-based voice conversion framework to enhance speech recognition of the minority. To this end, we propose a voice translation model including an unsupervised phonology clustering to extract linguistic information to fit the minority's speech to a current acoustic model frame. Our proposal is a spectral feature adaptation method that can be placed in front of any commercial or open ASR system, avoiding directly modifying the speech recognizer. The experimental results and analysis demonstrate the effectiveness of our proposed method through improvement in elderly speech recognition accuracy.

show abstract

Vocoder-free End-to-End Voice Conversion with Transformer Network

Kim

Jung

2020

View full text Add to dashboard Cite

End-to-end Multimodal Emotion and Gender Recognition with Dynamic Joint Loss Weights

Chae¹,

Kim²,

Shin³

et al. 2018

Preprint

View full text Add to dashboard Cite

Unsupervised Representation Learning with Task-Agnostic Feature Masking for Robust End-to-End Speech Recognition

2023

View full text Add to dashboard Cite

Unsupervised learning-based approaches for training speech vector representations (SVR) have recently been widely applied. While pretrained SVR models excel in relatively clean automatic speech recognition (ASR) tasks, such as those recorded in laboratory environments, they are still insufficient for practical applications with various types of noise, intonation, and dialects. To cope with this problem, we present a novel unsupervised SVR learning method for practical end-to-end ASR models. Our approach involves designing a speech feature masking method to stabilize SVR model learning and improve the performance of the ASR model in a downstream task. By introducing a noise masking strategy into diverse combinations of the time and frequency regions of the spectrogram, the SVR model becomes a robust representation extractor for the ASR model in practical scenarios. In pretraining experiments, we train the SVR model using approximately 18,000 h of Korean speech datasets that included diverse speakers and were recorded in environments with various amounts of noise. The weights of the pretrained SVR extractor are then frozen, and the extracted speech representations are used for ASR model training in a downstream task. The experimental results show that the ASR model using our proposed SVR extractor significantly outperforms conventional methods.

show abstract

End-to-end speech recognition models using limited training data*

Kim¹,

Jung²

2020

Phonetics Speech Sci.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

June-Woo Kim

Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments

Vocoder-free End-to-End Voice Conversion with Transformer Network

End-to-end Multimodal Emotion and Gender Recognition with Dynamic Joint Loss Weights

Unsupervised Representation Learning with Task-Agnostic Feature Masking for Robust End-to-End Speech Recognition

End-to-end speech recognition models using limited training data*

Contact Info

Product

Resources

About