5th International Workshop on Speech Processing in Everyday Environments (CHiME 2018) 2018
DOI: 10.21437/chime.2018-12
|View full text |Cite
|
Sign up to set email alerts
|

The RWTH/UPB system combination for the CHiME 2018 Workshop

Abstract: This paper describes the systems for the single-array track and the multiple-array track of the 5th CHiME Challenge. The final system is a combination of multiple systems, using Confusion Network Combination (CNC). The different systems presented here are utilizing different front-ends and training sets for a Bidirectional Long Short-Term Memory (BLSTM) Acoustic Model (AM). The front-end was replaced by enhancements provided by Paderborn University [1]. The back-end has been implemented using RASR [2] and RETU… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
6

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 14 publications
0
4
0
Order By: Relevance
“…At the time of the challenge, Hitachi provided many contributions with Johns Hopkins University (JHU) on acoustic modeling (AM), language modeling (LM), and decoding techniques and achieved the second best result of a 48.2% WER [10]. On the other hand, Paderborn University achieved very promising speech enhancement (SE) techniques, named guided source separation (GSS) 2 , which achieved a significant improvement for evaluation data in multiple array settings [16,17]. We thought this is worth investigating to evaluate the results combining our contributions to assess the state-of-the-art performance of today's ASR system.…”
Section: Introductionmentioning
confidence: 99%
“…At the time of the challenge, Hitachi provided many contributions with Johns Hopkins University (JHU) on acoustic modeling (AM), language modeling (LM), and decoding techniques and achieved the second best result of a 48.2% WER [10]. On the other hand, Paderborn University achieved very promising speech enhancement (SE) techniques, named guided source separation (GSS) 2 , which achieved a significant improvement for evaluation data in multiple array settings [16,17]. We thought this is worth investigating to evaluate the results combining our contributions to assess the state-of-the-art performance of today's ASR system.…”
Section: Introductionmentioning
confidence: 99%
“…Those WER improvements have been obtained using the acoustic model provided by the challenge organizers. Further significant gains are obtained if a stronger back-end and if system combination is used, see our companion paper [17]. An implementation of the described speech enhancement system without SAD is available on GitHub 4 .…”
Section: Discussionmentioning
confidence: 99%
“…We evaluate the proposed front-end processing techniques by computing Word Error Rates (WERs) using the baseline ASR back-end provided by the challenge organizers, and observe significant WER improvements. Yet better WERs can be achieved by combining the presented front-end with a stronger back-end, as is shown in [17].…”
Section: Introductionmentioning
confidence: 96%
“…Although it's inferior to the USTC-iFlytek's, our system perform separation only once and has low computational complexity and model size apparently. The details on each ses- [32] 62.09 Toshiba [33] 63.30 STC [23] 63.30 RWTH-Paderborn [34] 68.40 Official [13] 80.28 sion and location over official AM and ours are given in Table 6. Even based on the official backend, our SD separation frontend contributes a 10% WER reduction, which is a significant improvement on this challenging task.…”
Section: Speaker-aware Trainingmentioning
confidence: 99%