Heeseung Kim scite author profile

Heeseung Kim

5Publications

17Citation Statements Received

143Citation Statements Given

How they've been cited

How they cite others

139

Affiliations

Seoul National University

Publications

Order By: Most citations

Silent Speech Recognition with Strain Sensors and Deep Learning Analysis of Directional Facial Muscle Movement

Yoo

Kim

Chung

et al. 2022

ACS Appl. Mater. Interfaces

View full text Add to dashboard Cite

Silent communication based on biosignals from facial muscle requires accurate detection of its directional movement and thus optimally positioning minimum numbers of sensors for higher accuracy of speech recognition with a minimal person-to-person variation. So far, previous approaches based on electromyogram or pressure sensors are ineffective in detecting the directional movement of facial muscles. Therefore, in this study, high-performance strain sensors are used for separately detecting x- and y-axis strain. Directional strain distribution data of facial muscle is obtained by applying three-dimensional digital image correlation. Deep learning analysis is utilized for identifying optimal positions of directional strain sensors. The recognition system with four directional strain sensors conformably attached to the face shows silent vowel recognition with 85.24% accuracy and even 76.95% for completely nonobserved subjects. These results show that detection of the directional strain distribution at the optimal facial points will be the key enabling technology for highly accurate silent speech recognition.

show abstract

Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data

Kim¹,

Kim²,

Yoon³

2022

Preprint

View full text Add to dashboard Cite

We propose Guided-TTS 2, a diffusion-based generative model for high-quality adaptive TTS using untranscribed data. Guided-TTS 2 combines a speakerconditional diffusion model with a speaker-dependent phoneme classifier for adaptive text-to-speech. We train the speaker-conditional diffusion model on large-scale untranscribed datasets for a classifier-free guidance method and further fine-tune the diffusion model on the reference speech of the target speaker for adaptation, which only takes 40 seconds. We demonstrate that Guided-TTS 2 shows comparable performance to high-quality single-speaker TTS baselines in terms of speech quality and speaker similarity with only a ten-second untranscribed data. We further show that Guided-TTS 2 outperforms adaptive TTS baselines on multi-speaker datasets even with a zero-shot adaptation setting. Guided-TTS 2 can adapt to a wide range of voices only using untranscribed speech, which enables adaptive TTS with the voice of non-human characters such as Gollum in "The Lord of the Rings".

show abstract

Edit-A-Video: Single Video Editing with Object-Aware Consistency

Shin¹,

Kim²,

Lee³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

The notion of water seen through the structure of Dalgol Village’s Mul-dang-gi-gi which is a ritual behavior to pull water in Muryong-dong, Ulsan, Korea

Woo¹,

Oh²,

Kim³

2021

lhc

View full text Add to dashboard Cite

UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data

Kim¹,

Kim²,

Yeom³

et al. 2023

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Heeseung Kim

Silent Speech Recognition with Strain Sensors and Deep Learning Analysis of Directional Facial Muscle Movement

Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data

Edit-A-Video: Single Video Editing with Object-Aware Consistency

The notion of water seen through the structure of Dalgol Village’s Mul-dang-gi-gi which is a ritual behavior to pull water in Muryong-dong, Ulsan, Korea

UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data

Contact Info

Product

Resources

About