2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8462661
|View full text |Cite
|
Sign up to set email alerts
|

Single Channel Target Speaker Extraction and Recognition with Speaker Beam

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
123
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 163 publications
(131 citation statements)
references
References 10 publications
0
123
1
Order By: Relevance
“…Zmolikova et al proposed a targetspeaker neural beamformer that extracts a target speaker's utterances given a short sample of the target speaker's speech [21]. This model was recently extended to deal with ASR-based loss to maximize ASR accuracy with promising results [22]. While the target-speaker models require additional input of a target speaker's speech sample, it can naturally solve the speaker permutation problem across utterances without using additional speaker identification after ASR.…”
Section: Introductionmentioning
confidence: 99%
“…Zmolikova et al proposed a targetspeaker neural beamformer that extracts a target speaker's utterances given a short sample of the target speaker's speech [21]. This model was recently extended to deal with ASR-based loss to maximize ASR accuracy with promising results [22]. While the target-speaker models require additional input of a target speaker's speech sample, it can naturally solve the speaker permutation problem across utterances without using additional speaker identification after ASR.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, to avoid such multistage processing, the use of an auxiliary speaker-aware feature has been investigated [20][21][22]. A clean speech spoken by the target speaker is also passed to the DNN.…”
Section: Auxiliary Speaker-aware Feature For Speech Separationmentioning
confidence: 99%
“…Success of model specialization suggests us that speaker information is important to improve the performance of speech applications including speech enhancement. In fact, for speech separation (or multi-talker separation [1]), several works have succeeded to extract the desired speaker's speech utilizing speaker information as an auxiliary input [20][21][22], in contrast to separating arbitrary speakers' mixture such as deep-clustering [23] and permutation invariant training [24]. A limitation of these studies is that they require a guidance signal such as adaptation utterance, because there is no way of knowing which signal in the speech-mixture is the target.…”
Section: Introductionmentioning
confidence: 99%
“…For such overlapped speech, neither conventional ASR nor speaker diarization provides a result with sufficient accuracy. It is known that mixing two speech significantly degrades ASR accuracy [4][5][6]. In addition, no speaker overlaps are assumed with most conventional speaker diarization techniques, such as clustering of speech partitions (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…In another line of research, target-speaker (TS) ASR, which automatically extracts and transcribes only the target speaker's utterances given a short sample of that speaker's speech, has been proposed [5,18].Žmolíková et al proposed a target-speaker neural beamformer that extracts a target speaker's utterances given a short sample of that speaker's speech [18]. This model was recently extended to handle ASR-based loss to maximize ASR accuracy with promising results [5]. TS-ASR can naturally solve the speaker-permutation problem across utterances.…”
Section: Introductionmentioning
confidence: 99%