Robust speaker identification via fusion of subglottal resonances and cepstral features

Guo, Jianmin; Yang, Ruochen; Arsikere, Harish; Alwan, Abeer

doi:10.1121/1.4979841

Cited by 14 publications

(9 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The testing is done on clean and noisy conditions to test the robustness of the proposed feature extraction algorithm, 4 noise types are chosen from the Noisex-92 noise dataset (babble, factory 1, pink and white) that are added to the test utterances with SNR levels 0, 5, 10 and 15 db. The results showed that the proposed features outperforms baseline features (PNCC and GFCC) and other proposed works Islam et al, in 2016, Korba et al, in 2018, Guo et al, in 2017and Ajgou et al in 2016, so it's a promising approach for extracting robust features and increasing speaker identification rate.…”

Section: Discussionmentioning

confidence: 85%

See 1 more Smart Citation

Adaptive wavelet thresholding with robust hybrid features for text-independent speaker identification system

Alabbasi

Jalil

Hasan

2020

IJECE

View full text Add to dashboard Cite

The robustness of speaker identification system over additive noise channel is crucial for real-world applications. In speaker identification (SID) systems, the extracted features from each speech frame are an essential factor for building a reliable identification system. For clean environments, the identification system works well; in noisy environments, there is an additive noise, which is affect the system. To eliminate the problem of additive noise and to achieve a high accuracy in speaker identification system a proposed algorithm for feature extraction based on speech enhancement and a combined features is presents. In this paper, a wavelet thresholding pre-processing stage, and feature warping (FW) techniques are used with two combined features named power normalized cepstral coefficients (PNCC) and gammatone frequency cepstral coefficients (GFCC) to improve the identification system robustness against different types of additive noises. Universal Background Model Gaussian Mixture Model (UBM-GMM) is used for features matching between the claim and actual speakers. The results showed performance improvement for the proposed feature extraction algorithm of identification system comparing with conventional features over most types of noises and different SNR ratios.

show abstract

Section: Discussionmentioning

confidence: 85%

“…Figure 5. Comparison with other studies: (a) with work proposed by [1], (b) with work proposed by[18], (c) with work proposed by[38], and (d) with work proposed by[39] …”

mentioning

confidence: 83%

Adaptive wavelet thresholding with robust hybrid features for text-independent speaker identification system

Alabbasi

Jalil

Hasan

2020

IJECE

View full text Add to dashboard Cite

show abstract

“…After the 1980s, characteristic parameters such as time domain decomposition, frequency domain decomposition, and wavelet packet node energy also gradually appeared and were widely used [6]. Jinxi Guo et al studied the recognition system in the noise environment [7] and made some achievements and progress.…”

Section: Introductionmentioning

confidence: 99%

Characteristic Sequence Analysis of Giant Panda Voiceprint

Liao

Hou

et al. 2022

Front. Phys.

View full text Add to dashboard Cite

By analyzing the voiceprint characteristics of giant panda’s voice, this study proposes a giant panda individual recognition method based on the characteristics of the composite Mel composite frequency cepstral coefficient (CMFCC) and proves that the characteristic sequence of the CMFCC has long-range dependent characteristics. First, the MFCC (Mel composite frequency cepstral coefficient) with a low frequency resolution is obtained by the Mel filter bank; then, the inverse Mel frequency cepstral coefficient (IMFCC) features of giant panda calls are extracted. The CMFCC characteristic sequence of giant panda voice composed of the MFCC and IMFCC improves the resolution of high- and low-frequency resolution characteristics of giant panda voice. Finally, the first-order difference characteristic parameters of the MFCC are integrated to obtain the difference characteristics between frames. Through experiments, the improvement of the system recognition effect is verified, and the recognition accuracy meets the theoretical expectation.

show abstract

“…A majority of automatic speaker recognition systems use only the physiological speech features due to their high discriminability and ease of characterization [2]. However, such automatic speaker recognition systems are vulnerable to audio degradations, such as background noise and channel effects [3]. Behavioral speech characteristics, while being susceptible to intra-user variations, are considered robust to audio degradations [4].…”

Section: Introductionmentioning

confidence: 99%

DEEPTALK: Vocal Style Encoding for Speaker Recognition and Speech Synthesis

Chowdhury

Ross

David

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Automatic speaker recognition algorithms typically characterize speech audio using short-term spectral features that encode the physiological and anatomical aspects of speech production. Such algorithms do not fully capitalize on speakerdependent characteristics present in behavioral speech features. In this work, we propose a prosody encoding network called DeepTalk for extracting vocal style features directly from raw audio data. The DeepTalk method outperforms several state-of-the-art speaker recognition systems across multiple challenging datasets. The speaker recognition performance is further improved by combining DeepTalk with a state-of-the-art physiological speech feature-based speaker recognition system. We also integrate DeepTalk into a current state-of-the-art speech synthesizer to generate synthetic speech. A detailed analysis of the synthetic speech shows that the DeepTalk captures F0 contours essential for vocal style modeling. Furthermore, DeepTalk-based synthetic speech is shown to be almost indistinguishable from real speech in the context of speaker recognition.

show abstract

Robust speaker identification via fusion of subglottal resonances and cepstral features

Cited by 14 publications

References 13 publications

Adaptive wavelet thresholding with robust hybrid features for text-independent speaker identification system

Adaptive wavelet thresholding with robust hybrid features for text-independent speaker identification system

Characteristic Sequence Analysis of Giant Panda Voiceprint

DEEPTALK: Vocal Style Encoding for Speaker Recognition and Speech Synthesis

Contact Info

Product

Resources

About