Multiple Hypothesis Tracking for Overlapping Speaker Segmentation

Hogg, Aidan O. T.; Evers, Christine; Naylor, Patrick A.

doi:10.1109/waspaa.2019.8937185

Cited by 6 publications

(6 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…e purpose of data association is to distinguish different targets and to solve the problem of overlapping sensor spatial coverage areas. e classic data association algorithms are nearest neighbor method [62], probabilistic data association algorithm (PDA) [63,64], multiple hypothesis method (MHT) [65,66], and probabilistic multiple hypothesis algorithm (PMHT) [67,68].…”

Section: Data Associationmentioning

confidence: 99%

A New View of Multisensor Data Fusion: Research on Generalized Fusion

Chen

Liu

et al. 2021

Mathematical Problems in Engineering

View full text Add to dashboard Cite

Multisensor data generalized fusion algorithm is a kind of symbolic computing model with multiple application objects based on sensor generalized integration. It is the theoretical basis of numerical fusion. This paper aims to comprehensively review the generalized fusion algorithms of multisensor data. Firstly, the development and definition of multisensor data fusion are analyzed and the definition of multisensor data generalized fusion is given. Secondly, the classification of multisensor data fusion is discussed, and the generalized integration structure of multisensor and its data acquisition and representation are given, abandoning the research characteristics of object oriented. Then, the principle and architecture of multisensor data fusion are analyzed, and a generalized multisensor data fusion model is presented based on the JDL model. Finally, according to the multisensor data generalized fusion architecture, some related theories and methods are reviewed, and the tensor-based multisensor heterogeneous data generalized fusion algorithm is proposed, and the future work is prospected.

show abstract

Section: Data Associationmentioning

confidence: 99%

A New View of Multisensor Data Fusion: Research on Generalized Fusion

Chen

Liu

et al. 2021

Mathematical Problems in Engineering

View full text Add to dashboard Cite

show abstract

“…Exp-1 evaluates the performance of the proposed method as a complete segmentation system. The proposed method is compared against two baselines: baseline-1, previously presented by the authors in [49] and baseline-2, a state-of-the-art deep learning approach presented in [51].…”

Section: A Exp-1: Full Segmentation Using Proposed Systemmentioning

confidence: 99%

“…6. Baseline-1 system architecture presented in [49] with st: input signal, Φt: peak detections,Ψt: detection reliabilities, Zt: generated observations, T i : selected track hypotheses, ot: overlapping speech onsets, Bt: strongest candidate track and ct: speaker change onsets. This process solves the problem of one speaker generating multiple tracks since all the tracks generated by the same speaker will have the same trajectory and will also be harmonically related.…”

Section: A Exp-1: Full Segmentation Using Proposed Systemmentioning

confidence: 99%

Overlapping Speaker Segmentation Using Multiple Hypothesis Tracking of Fundamental Frequency

Hogg

Evers

Moore

et al. 2021

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

This paper demonstrates how the harmonic structure of voiced speech can be exploited to segment multiple overlapping speakers in a speaker diarization task. We explore how a change in the speaker can be inferred from a change in pitch. We show that voiced harmonics can be useful in detecting when more than one speaker is talking, such as during overlapping speaker activity. A novel system is proposed to track multiple harmonics simultaneously, allowing for the determination of onsets and end-points of a speaker's utterance in the presence of an additional active speaker. This system is bench-marked against a segmentation system from the literature that employs a bidirectional long short term memory network (BLSTM) approach and requires training. Experimental results highlight that the proposed approach outperforms the BLSTM baseline approach by 12.9% in terms of HIT rate for speaker segmentation. We also show that the estimated pitch tracks of our system can be used as features to the BLSTM to achieve further improvements of 1.21% in terms of coverage and 2.45% in terms of purity.

show abstract

“…The method in [16,25] is used to create subsets, ft,m, which are vectors containing reliable peak detections,φt, that are harmonically related. Multiple subsets are considered to allow tracking of more than one F0 track in the presence of overlapping speech.…”

Section: Harmonic Subset Generationmentioning

confidence: 99%

Multichannel Overlapping Speaker Segmentation Using Multiple Hypothesis Tracking Of Acoustic And Spatial Features

Hogg

Evers

Naylor

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

An essential part of any diarization system is the task of speaker segmentation which is important for many applications including speaker indexing and automatic speech recognition (ASR) in multi-speaker environments. Segmentation of overlapping speech has recently been a key focus of this work. In this paper we explore the use of a new multimodal approach for overlapping speaker segmentation that tracks both the fundamental frequency (F0) of the speaker and the speaker's direction of arrival (DOA) simultaneously. Our proposed multiple hypothesis tracking system, which simultaneously tracks both features, shows an improvement in segmentation performance when compared to tracking these features separately. An illustrative example of overlapping speech demonstrates the effectiveness of our proposed system. We also undertake a statistical analysis on 12 meetings from the AMI corpus and show an improvement in the HIT rate of 14.1% on average against a commonly used deep learning bidirectional long short term memory network (BLSTM) approach.

show abstract

Multiple Hypothesis Tracking for Overlapping Speaker Segmentation

Cited by 6 publications

References 23 publications

A New View of Multisensor Data Fusion: Research on Generalized Fusion

A New View of Multisensor Data Fusion: Research on Generalized Fusion

Overlapping Speaker Segmentation Using Multiple Hypothesis Tracking of Fundamental Frequency

Multichannel Overlapping Speaker Segmentation Using Multiple Hypothesis Tracking Of Acoustic And Spatial Features

Contact Info

Product

Resources

About