End-to-end Deep Neural Network Age Estimation

Ghahremani, Pegah; Nidadavolu, Phani Sankar; Chen, Nanxin; Villalba, Jesús; Povey, Daniel; Khudanpur, Sanjeev; Dehak, Najim

doi:10.21437/interspeech.2018-2015

Cited by 30 publications

(27 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The target number of age bins was 101, ranging from 0 to 100, for both facial and speech age estimation tasks. No label aggregation of age bins was done, unlike a previous work [13]. We adopted the Adam algorithm for optimization with an initial learning rate of 0.001.…”

Section: Age Estimation Experiments Using Face and Speech Informationmentioning

confidence: 99%

“…In contrast, relatively small corpora have been used for speech age estimation tasks compared with facial age corpora. For example, NIST Speaker recognition evaluations (SRE) 2008 and 2010, which are standard corpora used for speech age estimation task [11,12,13,14], contain only 1688 speakers. Unfortunately, the recording condition in these corpora is limited to telephone speech.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Age-VOX-Celeb: Multi-Modal Corpus for Facial and Speech Estimation

Tawara

Ogawa

Kitagishi

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Estimating a speaker's age from their speech is more challenging than age estimation from their face because of insufficiently available public corpora. To tackle this problem, we construct a new audio-visual age corpus named AgeVoxCeleb by annotating age labels to VoxCeleb2 videos. AgeVoxCeleb is the first large-scale, balanced, and multi-modal age corpus that contains both video and speech of the same speakers from a wide age range. Using AgeVox-Celeb, our paper makes the following contributions: (i) A facial age estimation model can outperform a speech age estimation model by comparing the state-of-the-art models in each task. (ii) Facial age estimation is more robust against the difference between training and test sets. (iii) We developed cross-modal transfer learning from face to speech age estimation, showing that the estimated age with a facial age estimation model can be used to train a speech age estimation model. Proposed AgeVoxCeleb will be published in https://github.com/nttcslab-sp/agevoxceleb.

show abstract

Section: Age Estimation Experiments Using Face and Speech Informationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Age-VOX-Celeb: Multi-Modal Corpus for Facial and Speech Estimation

Tawara

Ogawa

Kitagishi

et al. 2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…During puberty, vocal cords are thickened and elongated, the larynx descends, and the vocal tract is lengthened [15]. In adults, agerelated physiological changes continue to systematically transform speech parameters, such as pitch, formant frequencies, speech rate, and sound pressure [28,84].…”

Section: Inference Of Age and Gendermentioning

confidence: 99%

“…Automated approaches have been proposed to predict a target's age range (e.g., child, adolescent, adult, senior) or actual year of birth based on such measures [28,85]. In [85], researchers were able to estimate the age of male and female speakers with a mean absolute error of 4.7 years.…”

Section: Inference Of Age and Gendermentioning

confidence: 99%

Privacy Implications of Voice and Speech Analysis – Information Disclosure by Inference

Kröger

Lutz

Raschke

2020

Privacy and Identity Management. Data for Better Living: AI and Privacy

View full text Add to dashboard Cite

Internet-connected devices, such as smartphones, smartwatches, and laptops, have become ubiquitous in modern life, reaching ever deeper into our private spheres. Among the sensors most commonly found in such devices are microphones. While various privacy concerns related to microphone-equipped devices have been raised and thoroughly discussed, the threat of unexpected inferences from audio data remains largely overlooked. Drawing from literature of diverse disciplines, this paper presents an overview of sensitive pieces of information that can, with the help of advanced data analysis methods, be derived from human speech and other acoustic elements in recorded audio. In addition to the linguistic content of speech, a speaker's voice characteristics and manner of expression may implicitly contain a rich array of personal information, including cues to a speaker's biometric identity, personality, physical traits, geographical origin, emotions, level of intoxication and sleepiness, age, gender, and health condition. Even a person's socioeconomic status can be reflected in certain speech patterns. The findings compiled in this paper demonstrate that recent advances in voice and speech processing induce a new generation of privacy threats.

show abstract

“…• Optimal way of training the x-vectors for the age estimation task is proposed. The training on the NIST SRE08 dataset is performed and testing is done against SRE10 (Ghahremani, et al 2018). The implementation is performed based on Series of Time Delay Layers, a part of the DNN followed by temporal pooling layer that summarized the feature sequence into a single fixed dimension embedding was further fed into the feed-forward layers.…”

Section: Literature Reviewmentioning

confidence: 99%

Speaker Recognition System based on Age-related Features using Convolutional and Deep Neural Networks

Kuppusamy

Chandra

2020

Preprint

View full text Add to dashboard Cite

With the advent of conversational voice recognition systems growing such as Alexa, SIRI, OK Google, etc., natural language conversational systems including Chatbot and voice recognition systems are in new high and determining the age of a speaker is critical for setting the pertinent context. Age can be inferred from the speech signal by inferring various factors such as physical attributes of voice, linguistic attributes, frequency, speech rate,etc., The proposed research article discusses about extracting the spectral features of speech such as Cepstral Coefficients, Spectral Decrease, Centroid, Flatness, Spectral Entropy, F0DIFF, Jitter and Shimmer as inputs. This would help in classifying speaker age through deep learning techniques. A novel approach is addressed along with the model for implementation using Deep Neural Network and Convolutional Neural Network for classifying the features using three different classifiers which are Gaussian Mixture Model (GMM), Support Vector Machine (SVM) and GMM-SVM. The results obtained from the proposed system would outline the performance in speaker age recognition.

show abstract

End-to-end Deep Neural Network Age Estimation

Cited by 30 publications

References 19 publications

Age-VOX-Celeb: Multi-Modal Corpus for Facial and Speech Estimation

Age-VOX-Celeb: Multi-Modal Corpus for Facial and Speech Estimation

Privacy Implications of Voice and Speech Analysis – Information Disclosure by Inference

Speaker Recognition System based on Age-related Features using Convolutional and Deep Neural Networks

Contact Info

Product

Resources

About