2020
DOI: 10.1109/access.2020.3025941
|View full text |Cite
|
Sign up to set email alerts
|

A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments

Abstract: Speaker verification (SV) has recently attracted considerable research interest due to the growing popularity of virtual assistants. At the same time, there is an increasing requirement for an SV system: it should be robust to short speech segments, especially in noisy and reverberant environments. In this paper, we consider one more important requirement for practical applications: the system should be robust to an audio stream containing long non-speech segments, where a voice activity detection (VAD) is not… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 17 publications
(5 citation statements)
references
References 63 publications
0
5
0
Order By: Relevance
“…We utilize the power of Convolutional Neural Networks (CNNs) and technology derived from the VGG-Net model [3] to further optimize our lightweight model. These advancements allow us to identify distinct speech patterns, increasing system accuracy and lowering resource needs at the same time.…”
Section: Introductionmentioning
confidence: 99%
“…We utilize the power of Convolutional Neural Networks (CNNs) and technology derived from the VGG-Net model [3] to further optimize our lightweight model. These advancements allow us to identify distinct speech patterns, increasing system accuracy and lowering resource needs at the same time.…”
Section: Introductionmentioning
confidence: 99%
“…THE goal of acoustic signal enhancement (ASE) is to suppress the interfering noise signals by minimizing unwanted distortions and transforming noisy input signals into desired clean signals. Acoustic signals are often distorted owing to additive and convolutional noise, or recording device constraints, which limit the performance of real-world applications, such as soundscape information retrieval [1][2][3], sound environment analysis in a smart home [4][5][6], physiological sound recognition [7][8][9][10], speaker recognition and verification [11][12][13][14], automatic speech recognition (ASR) [15][16][17][18], hearing aids [19,20], and cochlear implants [21,22]. Several ASE approaches have been proposed in the literature to alleviate background noise problems.…”
Section: Introductionmentioning
confidence: 99%
“…For example, the presence of noise affecting human speech during the recording or the need for a very long speech signal to be recorded by the user to train the system or reverberation affecting the system. These challenges are discussed in [24], where the authors propose a method for short fragments of speech signals tackling the issues above described.…”
Section: Introductionmentioning
confidence: 99%