2018
DOI: 10.1016/j.specom.2018.01.007
|View full text |Cite
|
Sign up to set email alerts
|

Monaural multi-talker speech recognition using factorial speech processing models

Abstract: A Pascal challenge entitled monaural multi-talker speech recognition was developed, targeting the problem of robust automatic speech recognition against speech like noises which significantly degrades the performance of automatic speech recognition systems. In this challenge, two competing speakers say a simple command simultaneously and the objective is to recognize speech of the target speaker. Surprisingly during the challenge, a team from IBM research, could achieve a performance better than human listener… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(15 citation statements)
references
References 17 publications
0
15
0
Order By: Relevance
“…In the second scenario, two or more speaker voices are mixed together to produce a multi-talker speech utterance in which underlying source processes are the speakers' voices. In this case, previous achievements are surprising [6,7], even better than the results achieved manually by human listening (Fig. 8-left).…”
Section: Introductionmentioning
confidence: 53%
See 2 more Smart Citations
“…In the second scenario, two or more speaker voices are mixed together to produce a multi-talker speech utterance in which underlying source processes are the speakers' voices. In this case, previous achievements are surprising [6,7], even better than the results achieved manually by human listening (Fig. 8-left).…”
Section: Introductionmentioning
confidence: 53%
“…In fact, we assume that the initial solution to the system of equations with approximate joint-posteriors can be improved iteratively during the discriminative phase using marginal posteriors. Based on this assumption, we propose the following three steps for training a deep neural network for extracting joint-state posteriors: the generative phase, initializing joint-state layer weights, and 7 fine-tuning the network. Fig.…”
Section: Joint-state Posterior Estimation Using Deep Neural Networkmentioning
confidence: 99%
See 1 more Smart Citation
“…VidTIMIT database covers 40 speaker's (22 guys and 18 females) as well as subset of this database having 30 speaker's (15 guys and 15 female's speaker's) was utilized in work depicted in this article. Every speaker expresses eight distinct sentences before a camera fixated on substance of speaker, and the sentences in database are on the whole instances of persistent discourse booked from the standard VidTIMIT database as well as comprise an aggregate of the 210 expressions and the terms of 920 words, and the sound is recorded at the test rate of 64 KHz and 32 bits profundity; video is recorded at the rate of 24 outlines for each second [10].…”
Section: Database Techniquesmentioning
confidence: 99%
“…Speech recognition technology has applications in different systems, such as automatic translation telephones, question and answer machines, and intelligence decisions support systems [1][2][3][4]. The mechanism of speech recognition lies in the separation of the words and the matching of patterns between the words in the speech and the words in a dictionary [5][6].…”
Section: Introductionmentioning
confidence: 99%