2008
DOI: 10.1109/icassp.2008.4517932
|View full text |Cite
|
Sign up to set email alerts
|

Age and gender recognition for telephone applications based on GMM supervectors and support vector machines

Abstract: This paper compares two approaches of automatic age and gender classification with 7 classes. The first approach are Gaussian Mixture Models (GMMs) with Universal Background Models (UBMs), which is well known for the task of speaker identification/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM su… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
57
0
2

Year Published

2012
2012
2022
2022

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 91 publications
(60 citation statements)
references
References 9 publications
(13 reference statements)
1
57
0
2
Order By: Relevance
“…As for the classifiers used to classify these features in AGR task, in literature, logistic regression, linear regression, random forests and support vector machines are employed [6,7,8,9]. In [6], it is indicated that random forest trained on simple F0 and MFCC features performs close to the state-of-the-art system devised for 3-way classification problem (between male, female and child speech), which is a fusion of six subsystems.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…As for the classifiers used to classify these features in AGR task, in literature, logistic regression, linear regression, random forests and support vector machines are employed [6,7,8,9]. In [6], it is indicated that random forest trained on simple F0 and MFCC features performs close to the state-of-the-art system devised for 3-way classification problem (between male, female and child speech), which is a fusion of six subsystems.…”
Section: Introductionmentioning
confidence: 99%
“…Building on this point, typically in the literature [3,4,5,6,7,8], two broad classes of features are used for this task: fundamental frequency (F0) and short term features like mel frequency cepstrum coefficients (MFCCs). There are also works that have investigated high level representations like Gaussian mixture model supervector [9,8] and i-vectors [10].…”
Section: Introductionmentioning
confidence: 99%
“…In related work on speech quality, we could show that statistical models can be used to describe and estimate inherent properties of speech such as age and gender [1] and intelligibility [2]. Based on these findings, we build a model by extracting features from the speech signal and computing a probability of being "proper" speech, i.e., that the selected inversion frequency was indeed (close to) correct.…”
Section: A Statistical Modelmentioning
confidence: 99%
“…In general, "a rule of thumb is 60:1 'grunt time to clear speech time'." 1 Unfortunately, voice scrambling is not only used by authorized personnel but also by villains taking part in organized crime such as drug dealing and man hunt, making it hard for authorities to succeed in surveillance and raids.…”
Section: Introductionmentioning
confidence: 99%
“…This work focuses on this task. In [3] it has been shown, that age recognition with SVM and 7 gender dependent classes outperforms different other classification ideas. The classification results of the SVM idea were in the same range as humans, and the precision even better.…”
Section: Introductionmentioning
confidence: 99%