Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1558
|View full text |Cite
|
Sign up to set email alerts
|

Joint Maximization Decoder with Neural Converters for Fully Neural Network-Based Japanese Speech Recognition

Abstract: We present a novel fully neural network (FNN)-based automatic speech recognition (ASR) system that addresses the outof-vocabulary (OOV) problem. The most common approach to the OOV problem is leveraging character/sub-word level units as output symbols. Unfortunately, this approach is not suitable for Japanese and Mandarin Chinese since they have many more grapheme sets than English. Our solution is to develop FNN-based ASR that uses a pronunciation-based unit set with dictionaries, i.e., word-to-pronunciation … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 32 publications
0
1
0
Order By: Relevance
“…The parameters of end-to-end model extremely depend on language characteristics because the number of characters differs from language to language. For example, end-to-end speech recognition of English has 26 output labels representing each English character [9] and that of Japanese has 50 output labels representing each Japanese Kana symbol [10]. English and Japanese end-to-end speech recognition systems achieved significantly high recognition accuracy, thereby reducing the difference with the hybrid DNN-HMM architecture.…”
Section: Introductionmentioning
confidence: 99%
“…The parameters of end-to-end model extremely depend on language characteristics because the number of characters differs from language to language. For example, end-to-end speech recognition of English has 26 output labels representing each English character [9] and that of Japanese has 50 output labels representing each Japanese Kana symbol [10]. English and Japanese end-to-end speech recognition systems achieved significantly high recognition accuracy, thereby reducing the difference with the hybrid DNN-HMM architecture.…”
Section: Introductionmentioning
confidence: 99%