ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8682924
|View full text |Cite
|
Sign up to set email alerts
|

Speaker Change Detection Using Fundamental Frequency with Application to Multi-talker Segmentation

Abstract: This paper shows that time varying pitch properties can be used advantageously within the segmentation step of a multi-talker diarization system. First a study is conducted to verify that changes in pitch are strong indicators of changes in the speaker. It is then highlighted that an individual's pitch is smoothly varying and therefore can be predicted by means of a Kalman filter. Subsequently it is shown that if the pitch is not predictable then this is most likely due to a change in the speaker. Finally, a n… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1

Relationship

3
4

Authors

Journals

citations
Cited by 12 publications
(13 citation statements)
references
References 22 publications
0
12
1
Order By: Relevance
“…To explore this question on the AMI corpus [36], pitch estimates were calculated using the method of [21] applied to the IHM mixed-down stream of 16 AMI meetings. A Kalman filter [37] was used to track the pitch of the IHM mixed-down single channel stream as proposed in [38]. The Kalman track relies on the smooth variation of a speaker's pitch due to physiological constraints [39].…”
Section: B Temporal Variations In Fundamental Frequencymentioning
confidence: 99%
See 1 more Smart Citation
“…To explore this question on the AMI corpus [36], pitch estimates were calculated using the method of [21] applied to the IHM mixed-down stream of 16 AMI meetings. A Kalman filter [37] was used to track the pitch of the IHM mixed-down single channel stream as proposed in [38]. The Kalman track relies on the smooth variation of a speaker's pitch due to physiological constraints [39].…”
Section: B Temporal Variations In Fundamental Frequencymentioning
confidence: 99%
“…Speaker change detection is achieved by exploiting the temporal variations in the pitch. To accomplish this the method of [38], that operates using a single track, is utilised here. It was shown in Section III-C that multiple tracks can be generated for the same speaker.…”
Section: A Exp-1: Full Segmentation Using Proposed Systemmentioning
confidence: 99%
“…To obtain the full segmentation it is also necessary to detect speaker changes, ct, when the onset of a new speaker happens after the end-point of the previous speaker. To accomplish this the authors' previous method, that operates using a single track, is further developed here [21]. Speaker change detection is achieved by exploiting the temporal variations in the pitch.…”
Section: Speaker Change Detectionmentioning
confidence: 99%
“…(d) Segments of overlapping speech are identified as the onsets and end-points of multiple, uncorrelated pitch tracks. (e) The complete segmentation is obtained from the union of overlapping speech onsets with the speaker changes detected based on a model of the temporal variation of pitch [21].…”
Section: Introductionmentioning
confidence: 99%
“…A number of approaches were proposed to solve the problem of speaker segmentation. Most of these methods rely on features that fall into three separate categories: acoustic features [9,10], spatial features [11] and linguistic features [12]. More recently data driven, deep learning approaches have become popular [13][14][15].…”
Section: Introductionmentioning
confidence: 99%