Speaker Modeling Using Emotional Speech for More Robust Speaker Identification

Milošević, M.; Z, Nedeljković; Glavitsch, Ulrike; Đurović, Željko

doi:10.1134/s1064226919110184

Cited by 3 publications

(1 citation statement)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The use of discrete Hidden Markov Models (HMMs) with Linear Prediction Cepstral Coefficients (LPCC), Log Frequency Power Coefficients (LFPC), total signal energy (E), teager energy (TE), fundamental frequency (F0) and values of the formants (FF) has reached 72% of recognition rate (Nedeljković, Ðurović, 2015). In the case of Support Vector Machine (SVM) approach, the results oscillated between 62.78% and 91.3% depending on which test setup was used (Hassan, Damper, 2010;Milošević et al, 2016). The results obtained so far on the Polish database PES have been contrasted: 50.73% using k Nearest Neighbours (kNN) and Mel Frequency Cepstral Coefficients (MFCC) (Kamińska et al, 2013), whereas phoneme level formant features combined with Binary Decision Trees (BDT) give 81.9% (Ślot et al, 2009).…”

Section: Introductionmentioning

confidence: 99%

Archives of Acoustics

Milošević

Đurović

2020

View full text Add to dashboard Cite

Today's human-computer interaction systems have a broad variety of applications in which automatic human emotion recognition is of great interest. Literature contains many different, more or less successful forms of these systems. This work emerged as an attempt to clarify which speech features are the most informative, which classification structure is the most convenient for this type of tasks, and the degree to which the results are influenced by database size, quality and cultural characteristic of a language. The research is presented as the case study on Slavic languages.

show abstract

Section: Introductionmentioning

confidence: 99%

Archives of Acoustics

Milošević

Đurović

2020

View full text Add to dashboard Cite

show abstract

Design of intelligent behavior analysis software based on speaker identity classification algorithm in microgrid mode

Guo

2024

Adv Control Appl

View full text Add to dashboard Cite

Digital technology still has a low level of intelligence in the microgrid mode of teaching behavior analysis, resulting in the traditional manual observation and recording stage still being used for speaker identity classification, and the efficiency of teaching behavior analysis is also low. In response to the above issues, the research is based on the teacher‐student analysis method and proposes a dual clustering algorithm based on the general background model Gaussian mixture model for speaker identity classification, thereby realizing the development and design of intelligent behavior analysis software. The research results indicate that the average recall rate of behavior transition points in the classroom teaching discourse corpus of the intelligent behavior analysis software is 89.03%, which is better than traditional analysis methods. Therefore, the intelligent behavior analysis software constructed by the dual clustering algorithm has high effectiveness and practicality. The research proposes a method model and implements intelligent visualization for classroom teaching behavior analysis, improving the efficiency of analyzing current microgrid teaching behavior.

show abstract

A Robust Approach for Speaker Identification Using Dialect Information

Shah

Moinuddin²,

Khan

2022

Applied Computational Intelligence and Soft Computing

View full text Add to dashboard Cite

The present research is an effort to enhance the performance of voice processing systems, in our case the speaker identification system (SIS) by addressing the variability caused by the dialectical variations of a language. We present an effective solution to reduce dialect-related variability from voice processing systems. The proposed method minimizes the system’s complexity by reducing search space during the testing process of speaker identification. The speaker is searched from the set of speakers of the identified dialect instead of all the speakers present in system training. The study is conducted on the Pashto language, and the voice data samples are collected from native Pashto speakers of specific regions of Pakistan and Afghanistan where Pashto is spoken with different dialectal variations. The task of speaker identification is achieved with the help of a novel hierarchical framework that works in two steps. In the first step, the speaker’s dialect is identified. For automated dialect identification, the spectral and prosodic features have been used in conjunction with Gaussian mixture model (GMM). In the second step, the speaker is identified using a multilayer perceptron (MLP)-based speaker identification system, which gets aggregated input from the first step, i.e., dialect identification along with prosodic and spectral features. The robustness of the proposed SIS is compared with traditional state-of-the-art methods in the literature. The results show that the proposed framework is better in terms of average speaker recognition accuracy (84.5% identification accuracy) and consumes 39% less time for the identification of speaker.

show abstract

Speaker Modeling Using Emotional Speech for More Robust Speaker Identification

Cited by 3 publications

References 28 publications

Archives of Acoustics

Archives of Acoustics

Design of intelligent behavior analysis software based on speaker identity classification algorithm in microgrid mode

A Robust Approach for Speaker Identification Using Dialect Information

Contact Info

Product

Resources

About