2019
DOI: 10.1016/j.dsp.2019.01.023
|View full text |Cite
|
Sign up to set email alerts
|

Quality measures for speaker verification with short utterances

Abstract: The performances of the automatic speaker verification (ASV) systems degrade due to the reduction in the amount of speech used for enrollment and verification. Combining multiple systems based on different features and classifiers considerably reduces speaker verification error rate with short utterances. This work attempts to incorporate supplementary information during the system combination process. We use quality of the estimated model parameters as supplementary information. We introduce a class of novel … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(3 citation statements)
references
References 54 publications
0
3
0
Order By: Relevance
“…The performances of the automatic speaker verification systems degrade, due to the reduction in the amount of speech used for enrolment and verification. Combining multiple systems (based on different features and classifiers) can considerably reduce the speaker verification error-rate with short utterances [43].…”
Section: Proposed Methodsmentioning
confidence: 99%
“…The performances of the automatic speaker verification systems degrade, due to the reduction in the amount of speech used for enrolment and verification. Combining multiple systems (based on different features and classifiers) can considerably reduce the speaker verification error-rate with short utterances [43].…”
Section: Proposed Methodsmentioning
confidence: 99%
“…• There is a challenge in achieving high performance in speaker recognition systems based on short segment speech because the shorter the speech segment, the greater is the intra-speaker variability 48,49 . • Earlier works on multimodal speaker recognition systems have shown that performance improved either by using bone microphone speech or throat microphone speech in tandem with air microphone speech, as each of these alternate sensors capture complementary evidence.…”
Section: Short Speech Segments For Speaker Modelingmentioning
confidence: 99%
“…However, the available documentation on the SiiP project does not indicate the actual performance of the system with real data and what characteristics of an audio sample such as length or quality are enough to identify a person in a large OSINT database or phone recordings. According to research, a small audio sample of 30-60 s length can be enough to verify the identity of a person in benchmark datasets (Poddar et al, 2019) yet the robustness of the tools depends on factors such as noise, heterogeneous speakers, heterogeneous recording devices or audio encoding. 7 Practitioners recognised the quality of voice samples needed for speaker identification as one of the key challenges with the project, OSINT data generating better results than phone recordings.…”
Section: Features Of Siipmentioning
confidence: 99%