Ji Won Yoon scite author profile

Recently, generative adversarial networks (GANs) have been successfully applied to speech enhancement. However, there still remain two issues that need to be addressed: (1) GAN-based training is typically unstable due to its non-convex property, and (2) most of the conventional methods do not fully take advantage of the speech characteristics, which could result in a sub-optimal solution. In order to deal with these problems, we propose a progressive generator that can handle the speech in a multi-resolution fashion. Additionally, we propose a multi-scale discriminator that discriminates the real and generated speech at various sampling rates to stabilize GAN training. The proposed structure was compared with the conventional GAN-based speech enhancement algorithms using the VoiceBank-DEMAND dataset. Experimental results showed that the proposed approach can make the training faster and more stable, which improves the performance on various metrics for speech enhancement.

show abstract

Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation

Cho

Kwak

Yoon

et al. 2020

View full text Add to dashboard Cite

Speech is one of the most effective means of communication and is full of information that helps the transmission of utterer's thoughts. However, mainly due to the cumbersome processing of acoustic features, phoneme or word posterior probability has frequently been discarded in understanding the natural language. Thus, some recent spoken language understanding (SLU) modules have utilized an end-to-end structure that preserves the uncertainty information. This further reduces the propagation of speech recognition error and guarantees computational efficiency. We claim that in this process, the speech comprehension can benefit from the inference of massive pretrained language models (LMs). We transfer the knowledge from a concrete Transformer-based text LM to an SLU module which can face a data shortage, based on recent cross-modal distillation methodologies. We demonstrate the validity of our proposal upon the performance on the Fluent Speech Command dataset. Thereby, we experimentally verify our hypothesis that the knowledge could be shared from the top layer of the LM to a fully speech-based module, in which the abstracted speech is expected to meet the semantic representation.

show abstract

ResMax: Detecting Voice Spoofing Attacks with Residual Network and Max Feature Map

Kwak

Kwag

Lee

et al. 2021

View full text Add to dashboard Cite

Effect of Attitude of Patient Safety and Confidence in Safety Nursing on Patient Safety Management Activity in Nursing Students

Yoon¹

2021

Forum Public Saf Culture

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ji Won Yoon

TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition

A Multi-Resolution Approach to GAN-Based Speech Enhancement

Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation

ResMax: Detecting Voice Spoofing Attacks with Residual Network and Max Feature Map

Effect of Attitude of Patient Safety and Confidence in Safety Nursing on Patient Safety Management Activity in Nursing Students

Contact Info

Product

Resources

About