Age and gender recognition for telephone applications based on GMM supervectors and support vector machines

Bocklet, Tobias; Maier, Andreas; Bauer, J.G.; Burkhardt, Felix; Nöth, Elmar

doi:10.1109/icassp.2008.4517932

Cited by 91 publications

(60 citation statements)

References 9 publications

(13 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…As for the classifiers used to classify these features in AGR task, in literature, logistic regression, linear regression, random forests and support vector machines are employed [6,7,8,9]. In [6], it is indicated that random forest trained on simple F0 and MFCC features performs close to the state-of-the-art system devised for 3-way classification problem (between male, female and child speech), which is a fusion of six subsystems.…”

Section: Introductionmentioning

confidence: 99%

“…Building on this point, typically in the literature [3,4,5,6,7,8], two broad classes of features are used for this task: fundamental frequency (F0) and short term features like mel frequency cepstrum coefficients (MFCCs). There are also works that have investigated high level representations like Gaussian mixture model supervector [9,8] and i-vectors [10].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On Learning to Identify Genders from Raw Speech Signal Using CNNs

Kabil¹,

Muckenhirn²,

Magimai.-Doss³

2018

Interspeech 2018

View full text Add to dashboard Cite

Automatic Gender Recognition (AGR) is the task of identifying the gender of a speaker given a speech signal. Standard approaches extract features like fundamental frequency and cepstral features from the speech signal and train a binary classifier. Inspired from recent works in the area of automatic speech recognition (ASR), speaker recognition and presentation attack detection, we present a novel approach where relevant features and classifier are jointly learned from the raw speech signal in end-to-end manner. We propose a convolutional neural networks (CNN) based gender classifier that consists of: (1) convolution layers, which can be interpreted as a feature learning stage and (2) a multilayer perceptron (MLP), which can be interpreted as a classification stage. The system takes raw speech signal as input, and outputs gender posterior probabilities. Experimental studies conducted on two datasets, namely AVspoof and ASVspoof 2015, with different architectures show that with simple architectures the proposed approach yields better system than standard acoustic features based approach. Further analysis of the CNNs show that the CNNs learn formant and fundamental frequency information for gender identification.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

On Learning to Identify Genders from Raw Speech Signal Using CNNs

Kabil¹,

Muckenhirn²,

Magimai.-Doss³

2018

Interspeech 2018

View full text Add to dashboard Cite

show abstract

“…In related work on speech quality, we could show that statistical models can be used to describe and estimate inherent properties of speech such as age and gender [1] and intelligibility [2]. Based on these findings, we build a model by extracting features from the speech signal and computing a probability of being "proper" speech, i.e., that the selected inversion frequency was indeed (close to) correct.…”

Section: A Statistical Modelmentioning

confidence: 99%

“…In general, "a rule of thumb is 60:1 'grunt time to clear speech time'." 1 Unfortunately, voice scrambling is not only used by authorized personnel but also by villains taking part in organized crime such as drug dealing and man hunt, making it hard for authorities to succeed in surveillance and raids.…”

Section: Introductionmentioning

confidence: 99%

A software kit for automatic voice descrambling

Riedhammer

Ring

Nöth

et al. 2012

2012 IEEE International Conference on Communications (ICC)

View full text Add to dashboard Cite

Abstract-Voice scrambling is widely used to add privacy to the radio communication of various authorities -but is also used by criminals to evade prosecution. In this article, we consider various analog voice scrambling techniques such as fixed frequency inversion, splitband inversion and rolling code scramblers. We explain how to break them using automatically extracted measures and scoring algorithms, and evaluate the proposed system using simulated data. While the simple inversion can be easily broken, the more advanced techniques require additional work prior to unsupervised automatization; the presented user interface allows the user to refine the automatic results to obtain a high quality solution.

show abstract

“…This work focuses on this task. In [3] it has been shown, that age recognition with SVM and 7 gender dependent classes outperforms different other classification ideas. The classification results of the SVM idea were in the same range as humans, and the precision even better.…”

Section: Introductionmentioning

confidence: 99%

Age Determination of Children in Preschool and Primary School Age with GMM-Based Supervectors and Support Vector Machines/Regression

Maier

Nöth

Text, Speech and Dialogue

Self Cite

View full text Add to dashboard Cite

Abstract. This paper focuses on the automatic determination of the age of children in preschool and primary school age. For each child a Gaussian Mixture Model (GMM) is trained. As training method the Maximum A Posteriori adaptation (MAP) is used. MAP derives the speaker models from a Universal Background Model (UBM) and does not perform an independent parameter estimation. The means of each GMM are extracted and concatenated, which results in a so-called GMM supervector. These supervectors are then used as meta features for classification with Support Vector Machines (SVM) or for Support Vector Regression (SVR). With the classification system a precision of 83 % was achieved and a recall of 66 %. When the regression system was used to determine the age in years, a mean error of 0.8 years and a maximal error of 3 years was obtained. A regression with a monthly accuracy brought similar results.

show abstract

Age and gender recognition for telephone applications based on GMM supervectors and support vector machines

Cited by 91 publications

References 9 publications

On Learning to Identify Genders from Raw Speech Signal Using CNNs

On Learning to Identify Genders from Raw Speech Signal Using CNNs

A software kit for automatic voice descrambling

Age Determination of Children in Preschool and Primary School Age with GMM-Based Supervectors and Support Vector Machines/Regression

Contact Info

Product

Resources

About