2018
DOI: 10.1016/j.specom.2018.02.009
|View full text |Cite
|
Sign up to set email alerts
|

Speaker recognition from whispered speech: A tutorial survey and an application of time-varying linear prediction

Abstract: From the available biometric technologies, automatic speaker recognition is one of the most convenient and accessible ones due to abundance of mobile devices equipped with a microphone, allowing users to be authenticated across multiple environments and devices. Speaker recognition also finds use in forensics and surveillance. Due to the acoustic mismatch induced by varied environments and devices of the same speaker, leading to increased number of identification errors, much of the research focuses on compens… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0
2

Year Published

2018
2018
2022
2022

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 38 publications
(22 citation statements)
references
References 58 publications
0
19
0
2
Order By: Relevance
“…Audio In literature there is a large body of work on Speaker recognition [6]. One way to achieve it is with diarization techniques where the audio stream is partitioned in segments according to the identity of the speaker.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Audio In literature there is a large body of work on Speaker recognition [6]. One way to achieve it is with diarization techniques where the audio stream is partitioned in segments according to the identity of the speaker.…”
Section: Related Workmentioning
confidence: 99%
“…whispering, or that background noise alters the identification. Vestman et al [6] do a deep taxonomy on different features that address these issues and propose a sound time-varying feature which gives state of the art results. With the rise of the Convolutional Neural Network (CNN), some proposals [9] have exploit this end-to-end solution for speaker diarization.…”
Section: Related Workmentioning
confidence: 99%
“…Apesar de úteis, tais características isoladamente limitam a realização de tarefas importantes como detecção e reconhecimento de fala e similares com maior grau de complexidade. Para tais tarefas, características mais robustas tem sido empregadas, como Linear Predictive Coding (LPC), Linear Predictive Cepstral Coefficients (LPCC), Linear Frequency Cepstral Coefficients (LFCCs) and Exponential Frequency Cepstral Coefficients (EFCCs) [Vestman et al, 2018]. estado da arte em análise de áudio atual [Yang et al, 2019].…”
Section: Características Auraisunclassified
“…Though the MFCCs are relatively more robust compared to other cepstral features such as linear frequency cepstral coefficients (LFCCs) or LPCCs, the ASV performance with MFCCs are severely degraded in real-world conditions due to the mismatch of acoustic conditions in enrollment (or speaker registration) and verification (or speaker authentication) phase [19,20]. To overcome some of the shortcomings of MFCCs, various acoustic features like frequency domain linear prediction (FDLP) [21], cochlear frequency cepstral coefficients (CFCCs) [22], power-normalized cepstral coefficients (PNCCs) [23], mean Hilbert envelope coefficients (MHECs) [24], Gammatone frequency cepstral coeffi-cients (GFCCs) [25], constant-Q cepstral coefficients (CQCCs) [26], time-varying linear prediction (TVLP) [27], and locally-normalized cepstral coefficients (LNCCs) [28] were proposed. All these features even though achieve better performance in noisy condition, they require a large number of user-defined parameters.…”
Section: Introductionmentioning
confidence: 99%