ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053096
|View full text |Cite
|
Sign up to set email alerts
|

Overlap-Aware Diarization: Resegmentation Using Neural End-to-End Overlapped Speech Detection

Abstract: We address the problem of effectively handling overlapping speech in a diarization system. First, we detail a neural Long Short-Term Memory-based architecture for overlap detection. Secondly, detected overlap regions are exploited in conjunction with a frame-level speaker posterior matrix to make two-speaker assignments for overlapped frames in the resegmentation step. The overlap detection module achieves state-of-the-art performance on the AMI, DIHARD, and ETAPE corpora. We apply overlap-aware resegmentation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
72
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 65 publications
(73 citation statements)
references
References 18 publications
1
72
0
Order By: Relevance
“…We evaluated two approaches for assigning a second speaker. An heuristic that considers the two closest speakers in time [25] and, based on [26], an approach where the second most-likely speaker of the output of VB-HMM diarization is used to provide the second label, but applied using x-vectors as input frames instead of melfrequency cepstral coefficients. Given the current pipeline, obtaining the second label is quite straightforward as we simply need to output the two most likely speakers for each frame.…”
Section: Overlapped Speech Handlingmentioning
confidence: 99%
“…We evaluated two approaches for assigning a second speaker. An heuristic that considers the two closest speakers in time [25] and, based on [26], an approach where the second most-likely speaker of the output of VB-HMM diarization is used to provide the second label, but applied using x-vectors as input frames instead of melfrequency cepstral coefficients. Given the current pipeline, obtaining the second label is quite straightforward as we simply need to output the two most likely speakers for each frame.…”
Section: Overlapped Speech Handlingmentioning
confidence: 99%
“…(3) We show that the proposed diarization method achieves state-of-the-art accuracy, slightly outperforming the previous best result [3] on the AMI Headset Mix corpus.…”
Section: Introductionmentioning
confidence: 91%
“…Multi-person speaker diarization: The most common approach to speaker diarization with simultaneous speech is to use an overlapping speech detector; for those segments that contain overlap, the set of speakers can be estimated [3,9,10,11,12]. For the latter step, one approach is to select the top k closest speakers in the embedding space.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations