2020
DOI: 10.48550/arxiv.2001.05031
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Robust Speaker Recognition Using Speech Enhancement And Attention Model

Abstract: In this paper, a novel architecture for speaker recognition is proposed by cascading speech enhancement and speaker processing. Its aim is to improve speaker recognition performance when speech signals are corrupted by noise. Instead of individually processing speech enhancement and speaker recognition, the two modules are integrated into one framework by a joint optimisation using deep neural networks. Furthermore, to increase robustness against noise, a multi-stage attention mechanism is employed to highligh… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
5
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 34 publications
0
5
0
Order By: Relevance
“…CBAM refines the 3-dimensional feature map by combining channel attention module and spatial attention module in CNN architecture [19]. Motivated by CBAM, several studies have been conducted in speech processing field [23,24]. However, to our best knowledge, there was no attempt to apply CBAM for VAD.…”
Section: Attention Modulementioning
confidence: 99%
“…CBAM refines the 3-dimensional feature map by combining channel attention module and spatial attention module in CNN architecture [19]. Motivated by CBAM, several studies have been conducted in speech processing field [23,24]. However, to our best knowledge, there was no attempt to apply CBAM for VAD.…”
Section: Attention Modulementioning
confidence: 99%
“…To learn a noise-invariant speaker embedding, adversarial training [17,18] and variability-invariant loss [19] are investigated. Also, joint training of speech enhancement network and speaker embedding network can improve the ASV performance under noisy conditions [20,21,22]. For deep speaker modeling with microphone array, a multichannel training framework is proposed for speaker embedding extraction [23].…”
Section: Introductionmentioning
confidence: 99%
“…Notable works include masking [1] and mapping [2] based approach, Speech Enhancement Generative Adversarial Network (SEGAN) [3], Deep Feature Loss (DFL) [4], end-to-end metric optimization [5], and Transformer based approach [6,7]. Meanwhile, an active research exists in the robustness of Speaker Verification (SV) systems [8,9,10,11]. Another reason for interest in speech enhancement arises from the notion that it is considered as a modern solution to improve noise robustness in SV systems [10,12,13].…”
Section: Introductionmentioning
confidence: 99%