The Speaker and Language Recognition Workshop (Odyssey 2020) 2020
DOI: 10.21437/odyssey.2020-17
|View full text |Cite
|
Sign up to set email alerts
|

Linguistically Aided Speaker Diarization Using Speaker Role Information

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4

Relationship

2
7

Authors

Journals

citations
Cited by 15 publications
(16 citation statements)
references
References 42 publications
0
12
0
Order By: Relevance
“…The more recent speaker diarization systems that take advantage of the ASR transcript have employed a DNN model to capture the linguistic pattern in the given ASR output to enhance the speaker diarization result. The authors in [177] proposed a way of using the linguistic information for the speaker diarization task where participants have distinct roles that are known to the speaker diarization system. Fig.…”
Section: Using Lexical Information From Asrmentioning
confidence: 99%
“…The more recent speaker diarization systems that take advantage of the ASR transcript have employed a DNN model to capture the linguistic pattern in the given ASR output to enhance the speaker diarization result. The authors in [177] proposed a way of using the linguistic information for the speaker diarization task where participants have distinct roles that are known to the speaker diarization system. Fig.…”
Section: Using Lexical Information From Asrmentioning
confidence: 99%
“…Moreover, the architecture design followed, where the various modules are trained independently and are then connected to form a pipeline, inevitably leads to error propagation. There are indications that alternative frameworks could reduce errors in specific cases, if for example diarization is aware of the different speaker roles (Flemotomos, Georgiou, & Narayanan, 2020) or if the two tasks of diarization and role recognition are performed simultaneously (Flemotomos, Papadopoulos, et al, 2018).…”
Section: Limitations and Conclusionmentioning
confidence: 99%
“…Although speaker diarization is conventionally an audio-only task, the linguistic content carried by speech signals [219] and the speaker behaviors, e.g., the movement of lips, recorded by videos [220] offer valuable supplementary cues to the detection of active speakers. To incorporate the aforementioned knowledge, multimodal speaker diarization is emerging.…”
Section: Multimodal Speaker Diarizationmentioning
confidence: 99%
“…The first class is audio-linguistic speaker diarization, Park et al [221] integrated lexical cues and acoustic cues together by a gated recurrent unit-based sequence-to-sequence model, which improves the diarization performance by exploring linguistic variability deeply. The effectiveness of using both the linguistic and acoustic cues for diarization has been manifested further in structured scenarios [222,219] where the speakers are assumed to produce distinguishable linguistic patterns. For instance, a teacher is likely to speak in a more didactic style while a student tends to be more inquisitive; a doctor is likely to inquire on symptoms and prescribe while a patient describe symptoms, etc.…”
Section: Multimodal Speaker Diarizationmentioning
confidence: 99%