ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683397
|View full text |Cite
|
Sign up to set email alerts
|

A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
18
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(19 citation statements)
references
References 14 publications
1
18
0
Order By: Relevance
“…The performance on the gender classification task on the Common Voice dataset with a baseline x-vector embedder is presented in Table 8 . The age estimation RMSE of presented approach is 8.44 and 7.96 years for female and male speakers is comparable to the results reported by the authors at [ 22 ], with 8.63 and 7.60 years female/male. However, the network does not react well to the attempts of using transfer learning—a system pre-trained on Common Voice actually offers worse results then the system with no pre-training.…”
Section: Resultssupporting
confidence: 87%
See 1 more Smart Citation
“…The performance on the gender classification task on the Common Voice dataset with a baseline x-vector embedder is presented in Table 8 . The age estimation RMSE of presented approach is 8.44 and 7.96 years for female and male speakers is comparable to the results reported by the authors at [ 22 ], with 8.63 and 7.60 years female/male. However, the network does not react well to the attempts of using transfer learning—a system pre-trained on Common Voice actually offers worse results then the system with no pre-training.…”
Section: Resultssupporting
confidence: 87%
“…The work in [ 22 ] describes a DNN implementation for a joint height and age estimation system. Their results for age estimation are 0.6 years in terms of root mean square error (RMSE), 7.60 and 8.63 years for male and female using the TIMIT dataset [ 23 ].…”
Section: Introductionmentioning
confidence: 99%
“…The second issue is the design of a proper classification model [ 6 , 50 ]. Recently, deep learning models have been applied for age and gender recognition [ 7 ]; however, the aforementioned issues remain unresolved.…”
Section: Discussion and Comparative Analysismentioning
confidence: 99%
“…The constructed x-vector was then used for age estimation based on the speaker speech signal. A unified DNN architecture to recognize both the height and age of a speaker from short durations of speech was also proposed [ 7 ], which improved age estimation by 0.6 years in terms of the root mean square error (RMSE) over the classical SVR. The authors of [ 8 ] proposed a novel age estimation system based on Long short-term memory (LSTM) recurrent neural networks (RNN) that can deal with short utterances using acoustic features.…”
Section: Introductionmentioning
confidence: 99%
“…More recently, the Deep Learning (DL) paradigm has been applied to age estimation. For example, Deep Neural Networks (DNN) have been applied to predict both height and age of a speaker from short utterances [17]. In the case of age estimation, the Root Mean Squared Errors (RMSE) are 7.60 and 8.63 years for male and female respectively, when the mean duration of speech segments is around 2.5s.…”
Section: Introductionmentioning
confidence: 99%