Jaesong Lee scite author profile

Jaesong Lee

5Publications

88Citation Statements Received

129Citation Statements Given

How they've been cited

173

How they cite others

149

123

Affiliations

Naver (South Korea), Korea Advanced Institute of Science and Technology

Publications

Order By: Most citations

Intermediate Loss Regularization for CTC-Based Speech Recognition

Lee¹,

Watanabe

2021

View full text Add to dashboard Cite

We present a simple and efficient auxiliary loss function for automatic speech recognition (ASR) based on the connectionist temporal classification (CTC) objective. The proposed objective, an intermediate CTC loss, is attached to an intermediate layer in the CTC encoder network. This intermediate CTC loss well regularizes CTC training and improves the performance requiring only small modification of the code and small and no overhead during training and inference, respectively. In addition, we propose to combine this intermediate CTC loss with stochastic depth training, and apply this combination to a recently proposed Conformer network. We evaluate the proposed method on various corpora, reaching word error rate (WER) 9.9% on the WSJ corpus and character error rate (CER) 5.2% on the AISHELL-1 corpus respectively, based on CTC greedy search without a language model. Especially, the AISHELL-1 task is comparable to other state-of-the-art ASR systems based on autoregressive decoder with beam search.

show abstract

Interactive Visualization and Manipulation of Attention-based Neural Machine Translation

Lee¹,

Shin²,

Kim³

2017

View full text Add to dashboard Cite

While neural machine translation (NMT) provides high-quality translation, it is still hard to interpret and analyze its behavior. We present an interactive interface for visualizing and intervening behavior of NMT, specifically concentrating on the behavior of beam search mechanism and attention component. The tool (1) visualizes search tree and attention and (2) provides interface to adjust search tree and attention weight (manually or automatically) at real-time. We show the tool help users understand NMT in various ways.

show abstract

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

Higuchi¹,

Chen²,

Fujita³

et al. 2021

View full text Add to dashboard Cite

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

Higuchi¹,

Chen²,

Fujita³

et al. 2021

Preprint

View full text Add to dashboard Cite

Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines. Showing great potential for real-time applications, an increasing number of NAR models have been explored in different fields to mitigate the performance gap against AR models. In this work, we conduct a comparative study of various NAR modeling methods for end-to-end automatic speech recognition (ASR). Experiments are performed in the state-of-the-art setting using ESPnet. The results on various tasks provide interesting findings for developing an understanding of NAR ASR, such as the accuracy-speed trade-off and robustness against long-form utterances. We also show that the techniques can be combined for further improvement and applied to NAR end-to-end speech translation. All the implementations are publicly available to encourage further research in NAR speech processing.

show abstract

Intermediate Loss Regularization for CTC-based Speech Recognition

Lee¹,

Watanabe

2021

Preprint

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jaesong Lee

Intermediate Loss Regularization for CTC-Based Speech Recognition

Interactive Visualization and Manipulation of Attention-based Neural Machine Translation

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

Intermediate Loss Regularization for CTC-based Speech Recognition

Contact Info

Product

Resources

About