Proceedings of the 17th International Conference on Spoken Language Translation 2020
DOI: 10.18653/v1/2020.iwslt-1.24
|View full text |Cite
|
Sign up to set email alerts
|

CUNI Neural ASR with Phoneme-Level Intermediate Step for~Non-Native~SLT at IWSLT 2020

Abstract: In this paper, we present our submission to the Non-Native Speech Translation Task for IWSLT 2020. Our main contribution is a proposed speech recognition pipeline that consists of an acoustic model and a phoneme-tographeme model. As an intermediate representation, we utilize phonemes. We demonstrate that the proposed pipeline surpasses commercially used automatic speech recognition (ASR) and submit it into the ASR track. We complement this ASR with off-the-shelf MT systems to take part also in the speech trans… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 21 publications
(12 reference statements)
0
2
0
Order By: Relevance
“…The Uzbek language is limited in terms of data set, since it could not develop naturally for a long time, but the use of Bayesian models can be useful in terms of building rhythmic and intonational parameters. Polá k et al [39] develop a speech recognition pipeline consisting of an acoustic model and a phoneme-grapheme model. Such a system is superior to automatic speech recognition.…”
Section: Using Different Technologies For Phonemic Speech Recognition...mentioning
confidence: 99%
“…The Uzbek language is limited in terms of data set, since it could not develop naturally for a long time, but the use of Bayesian models can be useful in terms of building rhythmic and intonational parameters. Polá k et al [39] develop a speech recognition pipeline consisting of an acoustic model and a phoneme-grapheme model. Such a system is superior to automatic speech recognition.…”
Section: Using Different Technologies For Phonemic Speech Recognition...mentioning
confidence: 99%
“…However, these studies were primarily designed for a monolingual setup, and their main goal was to perform spelling correction rather than involving P2G translation. In the field of two-pass ASR with P2G translation, a notable study by [15] focuses on utilizing phonemes as an intermediate representation. They introduce a comprehensive two-pass ASR system incorporating phoneme recognition and P2G translation stages.…”
Section: Two-pass Automatic Speech Recognitionmentioning
confidence: 99%
“…Additionally, P2G translation can be further enhanced through training with noisy text data, enabling robust performance in noisy ASR hypotheses. Previous studies such as [11,12,15] have employed the K-fold method to generate ASR noise for training the translation model. Another approach, as seen in [14], involves generating synthetic audio and applying ASR inference to produce noisy data for a translator.…”
Section: Two-pass Automatic Speech Recognitionmentioning
confidence: 99%