Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-2015
|View full text |Cite
|
Sign up to set email alerts
|

End-to-end Deep Neural Network Age Estimation

Abstract: In this paper, we apply the recently proposed x-vector neural network architecture for the task of age estimation. This architecture maps a variable length utterance into a fixed dimensional embedding which retains the relevant sequence level information. This is achieved by a temporal pooling layer. From the embedding, a series of layers is applied to make predictions. The full network is trained end-to-end in a discriminative fashion. This kind of network is starting to outperform the state-ofthe-art i-vecto… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
26
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 30 publications
(27 citation statements)
references
References 19 publications
1
26
0
Order By: Relevance
“…The target number of age bins was 101, ranging from 0 to 100, for both facial and speech age estimation tasks. No label aggregation of age bins was done, unlike a previous work [13]. We adopted the Adam algorithm for optimization with an initial learning rate of 0.001.…”
Section: Age Estimation Experiments Using Face and Speech Informationmentioning
confidence: 99%
See 1 more Smart Citation
“…The target number of age bins was 101, ranging from 0 to 100, for both facial and speech age estimation tasks. No label aggregation of age bins was done, unlike a previous work [13]. We adopted the Adam algorithm for optimization with an initial learning rate of 0.001.…”
Section: Age Estimation Experiments Using Face and Speech Informationmentioning
confidence: 99%
“…In contrast, relatively small corpora have been used for speech age estimation tasks compared with facial age corpora. For example, NIST Speaker recognition evaluations (SRE) 2008 and 2010, which are standard corpora used for speech age estimation task [11,12,13,14], contain only 1688 speakers. Unfortunately, the recording condition in these corpora is limited to telephone speech.…”
Section: Introductionmentioning
confidence: 99%
“…During puberty, vocal cords are thickened and elongated, the larynx descends, and the vocal tract is lengthened [15]. In adults, agerelated physiological changes continue to systematically transform speech parameters, such as pitch, formant frequencies, speech rate, and sound pressure [28,84].…”
Section: Inference Of Age and Gendermentioning
confidence: 99%
“…Automated approaches have been proposed to predict a target's age range (e.g., child, adolescent, adult, senior) or actual year of birth based on such measures [28,85]. In [85], researchers were able to estimate the age of male and female speakers with a mean absolute error of 4.7 years.…”
Section: Inference Of Age and Gendermentioning
confidence: 99%
“…• Optimal way of training the x-vectors for the age estimation task is proposed. The training on the NIST SRE08 dataset is performed and testing is done against SRE10 (Ghahremani, et al 2018). The implementation is performed based on Series of Time Delay Layers, a part of the DNN followed by temporal pooling layer that summarized the feature sequence into a single fixed dimension embedding was further fed into the feed-forward layers.…”
Section: Literature Reviewmentioning
confidence: 99%