2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2019
DOI: 10.1109/waspaa.2019.8937185
|View full text |Cite
|
Sign up to set email alerts
|

Multiple Hypothesis Tracking for Overlapping Speaker Segmentation

Abstract: Speaker segmentation is an essential part of any diarization system. Applications of diarization include tasks such as speaker indexing, improving automatic speech recognition (ASR) performance and making single speaker-based algorithms available for use in multi-speaker environments. This paper proposes a multiple hypothesis tracking (MHT) method that exploits the harmonic structure associated with the pitch in voiced speech in order to segment the onsets and end-points of speech from multiple, overlapping sp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
3
2

Relationship

3
2

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 23 publications
0
6
0
Order By: Relevance
“…e purpose of data association is to distinguish different targets and to solve the problem of overlapping sensor spatial coverage areas. e classic data association algorithms are nearest neighbor method [62], probabilistic data association algorithm (PDA) [63,64], multiple hypothesis method (MHT) [65,66], and probabilistic multiple hypothesis algorithm (PMHT) [67,68].…”
Section: Data Associationmentioning
confidence: 99%
“…e purpose of data association is to distinguish different targets and to solve the problem of overlapping sensor spatial coverage areas. e classic data association algorithms are nearest neighbor method [62], probabilistic data association algorithm (PDA) [63,64], multiple hypothesis method (MHT) [65,66], and probabilistic multiple hypothesis algorithm (PMHT) [67,68].…”
Section: Data Associationmentioning
confidence: 99%
“…Exp-1 evaluates the performance of the proposed method as a complete segmentation system. The proposed method is compared against two baselines: baseline-1, previously presented by the authors in [49] and baseline-2, a state-of-the-art deep learning approach presented in [51].…”
Section: A Exp-1: Full Segmentation Using Proposed Systemmentioning
confidence: 99%
“…6. Baseline-1 system architecture presented in [49] with st: input signal, Φt: peak detections,Ψt: detection reliabilities, Zt: generated observations, T i : selected track hypotheses, ot: overlapping speech onsets, Bt: strongest candidate track and ct: speaker change onsets. This process solves the problem of one speaker generating multiple tracks since all the tracks generated by the same speaker will have the same trajectory and will also be harmonically related.…”
Section: A Exp-1: Full Segmentation Using Proposed Systemmentioning
confidence: 99%
“…The method in [16,25] is used to create subsets, ft,m, which are vectors containing reliable peak detections,φt, that are harmonically related. Multiple subsets are considered to allow tracking of more than one F0 track in the presence of overlapping speech.…”
Section: Harmonic Subset Generationmentioning
confidence: 99%