A Free Synthetic Corpus for Speaker Diarization Research

Edwards, Erik; Brenndoerfer, Michael; Robinson, Amanda; Sadoughi, Najmeh; Finley, Greg; Korenevsky, Maxim; Axtmann, Nico; Miller, Mark A.; Suendermann-Oeft, David

doi:10.1007/978-3-319-99579-3_13

Cited by 5 publications

(4 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Samples of the classification were verified by listening to the recording with the patient’s voice panned to the left and the therapist’s voice panned to the right headphone loudspeaker. To assess the quality of the performed silence detection and diarization, we performed a test on a free synthetic speech corpus (Edwards et al, 2018), which showed error rates for our method around 5% on average. Comparison with a manually coded 30-min extract from one of our therapies revealed an interrater reliability (Cohen’s κ) of .80 for identification of silence and correct diarization.…”

Section: Methodsmentioning

confidence: 99%

Silence in the psychotherapy of adolescents with borderline personality pathology.

Zimmermann¹,

Fürer²,

Schenk³

et al. 2021

Personality Disorders: Theory, Research, and Treatment

View full text Add to dashboard Cite

Silence in psychotherapy has been associated with different, sometimes opposing meanings. This study investigated silence during adolescent identity treatment in adolescent patients with borderline personality pathology. A more active therapeutic approach with less silence is advised in adolescent identity treatment. It was hypothesized that a session with more silence might be negatively perceived by adolescent patients. A total of 382 sessions that involved 21 female patients were analyzed. Silence was automatically detected from audio recordings. Diarization (segmenting an audio according to speaker identity) was performed. The patient's perception of the sessions was measured with the Session Evaluation Questionnaire. The amount of silence in the different speaker-switching patterns was not independent of one other. This finding supports the hypothesis of mutual attunement of patient and therapist concerning the amount of silence in a given session. Sessions with less silence were rated as being both smoother and better. The potential implications for clinical practice are discussed. The investigation of turn-taking and interpersonal temporal dynamics is relevant for psychotherapy research. The topic can be addressed efficiently using automated procedures.

show abstract

Section: Methodsmentioning

confidence: 99%

Silence in the psychotherapy of adolescents with borderline personality pathology.

Zimmermann¹,

Fürer²,

Schenk³

et al. 2021

Personality Disorders: Theory, Research, and Treatment

View full text Add to dashboard Cite

show abstract

“…Because the method requires only a specially-constructed dataset, it can be used equally to evaluate other diarization components [29] or end-to-end systems that are otherwise resistant to introspection. Future work might also further explore the relationship of conversation characteristics with accuracy by manipulating specific conversation characteristics in synthetic structures, such as the rate of turn changes, amount of speaker overlap, and number of speakers [3,4,7].…”

Section: Discussionmentioning

confidence: 99%

“…The two versions will not occur in a natural speech corpus. Instead of using natural conversations, both versions are constructed by splicing source audio from the two speakers according to the desired structure [7]. If there are more than two speakers and roles, then all factorial combinations of speakers and roles can be generated.…”

Section: Version 1: a A A B B A A Version 2: A A A B B A Amentioning

confidence: 99%

Speaker-conversation factorial designs for diarization error analysis

Seyfarth¹,

Srinivasan²,

Kirchhoff³

2021

Preprint

View full text Add to dashboard Cite

Speaker diarization accuracy can be affected by both acoustics and conversation characteristics. Determining the cause of diarization errors is difficult because speaker voice acoustics and conversation structure co-vary, and the interactions between acoustics, conversational structure, and diarization accuracy are complex. This paper proposes a methodology that can distinguish independent marginal effects of acoustic and conversation characteristics on diarization accuracy by remixing conversations in a factorial design. As an illustration, this approach is used to investigate gender-related and language-related accuracy differences with three diarization systems: a baseline system using subsegment x-vector clustering, a variant of it with shorter subsegments, and a third system based on a Bayesian hidden Markov model. Our analysis shows large accuracy disparities for the baseline system primarily due to conversational structure, which are partially mitigated in the other two systems. The illustration thus demonstrates how the methodology can be used to identify and guide diarization model improvements.

show abstract

“…The supervised diarization method is tested on the EMRAI Synthetic Diarization Corpus (Edwards et al, 2018). This corpus is based on the LibriSpeech Corpus (Panayotov et al, 2015), namely recordings of English audiobooks.…”

Section: Emrai Synthetic Diarization Corpusmentioning

confidence: 99%

Supervised Speaker Diarization Using Random Forests: A Tool for Psychotherapy Process Research

Fürer¹,

Schenk²,

Röth

et al. 2020

Front. Psychol.

View full text Add to dashboard Cite

Speaker diarization is the practice of determining who speaks when in audio recordings. Psychotherapy research often relies on labor intensive manual diarization. Unsupervised methods are available but yield higher error rates. We present a method for supervised speaker diarization based on random forests. It can be considered a compromise between commonly used labor-intensive manual coding and fully automated procedures. The method is validated using the EMRAI synthetic speech corpus and is made publicly available. It yields low diarization error rates (M: 5.61%, STD: 2.19). Supervised speaker diarization is a promising method for psychotherapy research and similar fields.

show abstract

A Free Synthetic Corpus for Speaker Diarization Research

Cited by 5 publications

References 40 publications

Silence in the psychotherapy of adolescents with borderline personality pathology.

Silence in the psychotherapy of adolescents with borderline personality pathology.

Speaker-conversation factorial designs for diarization error analysis

Supervised Speaker Diarization Using Random Forests: A Tool for Psychotherapy Process Research

Contact Info

Product

Resources

About