Shobhana Ganesh scite author profile

Speaker diarization (answering 'who spoke when') is a widely researched subject within speech technology. Numerous experiments have been run on datasets built from broadcast news, meeting data, and call centers-the task sometimes appears close to being solved. Much less work has begun to tackle the hardest diarization task of all: spontaneous conversations in real-world settings. Such diarization would be particularly useful for studies of language acquisition, where researchers investigate the speech children produce and hear in their daily lives. In this paper, we study audio gathered with a recorder worn by small children as they went about their normal days. As a result, each child was exposed to different acoustic environments with a multitude of background noises and a varying number of adults and peers. The inconsistency of speech and noise within and across samples poses a challenging task for speaker diarization systems, which we tackled via retraining and data augmentation techniques. We further studied sources of structured variation across raw audio files, including the impact of speaker type distribution, proportion of speech from children, and child age on diarization performance. We discuss the extent to which these findings might generalize to other samples of speech in the wild.

show abstract

Talker change detection: A comparison of human and machine performance

Sharma

Ganesh

Ganapathy

et al. 2019

View full text Add to dashboard Cite

The automatic analysis of conversational audio remains difficult, in part, due to the presence of multiple talkers speaking in turns, often with significant intonation variations and overlapping speech. The majority of prior work on psychoacoustic speech analysis and system design has focused on single-talker speech or multi-talker speech with overlapping talkers (for example, the cocktail party effect). There has been much less focus on how listeners detect a change in talker or in probing the acoustic features significant in characterizing a talker's voice in conversational speech. This study examines human talker change detection (TCD) in multi-party speech utterances using a behavioral paradigm in which listeners indicate the moment of perceived talker change. Human reaction times in this task can be well-estimated by a model of the acoustic feature distance among speech segments before and after a change in talker, with estimation improving for models incorporating longer durations of speech prior to a talker change. Further, human performance is superior to several online and offline state-of-the-art machine TCD systems.

show abstract

Free vibration analysis of shear-deformable Al alloy plates using FEM with first order shear deformation theory

Thanapaul

Ganesh

Milan

2015

IJCAET

View full text Add to dashboard Cite

Analyzing Human Reaction Time for Talker Change Detection

Sharma

Ganesh

Ganapathy

et al. 2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shobhana Ganesh

Leveraging LSTM Models for Overlap Detection in Multi-Party Meetings

Talker Diarization in the Wild: the Case of Child-centered Daylong Audio-recordings

Talker change detection: A comparison of human and machine performance

Free vibration analysis of shear-deformable Al alloy plates using FEM with first order shear deformation theory

Analyzing Human Reaction Time for Talker Change Detection

Contact Info

Product

Resources

About