Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1738
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Discovery of Recurring Speech Patterns Using Probabilistic Adaptive Metrics

Abstract: Unsupervised spoken term discovery (UTD) aims at finding recurring segments of speech from a corpus of acoustic speech data. One potential approach to this problem is to use dynamic time warping (DTW) to find well-aligning patterns from the speech data. However, automatic selection of initial candidate segments for the DTW-alignment and detection of "sufficiently good" alignments among those require some type of predefined criteria, often operationalized as threshold parameters for pair-wise distance metrics b… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 18 publications
(12 citation statements)
references
References 21 publications
0
12
0
Order By: Relevance
“…The qualitative analyses in this paper revealed a bias towards segmenting shorter (often filler) words. One approach would be to incorporate top-down information from a sparse term discovery system tailored towards discovering recurring but longer words [52], [53], [55], which could be used to bootstrap segmentation. Another approach would be to incorporate information from another modality; we know that infants have access to cross-situational cues from different modalities that can aid word learning [56].…”
Section: Conclusion Discussion and Future Workmentioning
confidence: 99%
“…The qualitative analyses in this paper revealed a bias towards segmenting shorter (often filler) words. One approach would be to incorporate top-down information from a sparse term discovery system tailored towards discovering recurring but longer words [52], [53], [55], which could be used to bootstrap segmentation. Another approach would be to incorporate information from another modality; we know that infants have access to cross-situational cues from different modalities that can aid word learning [56].…”
Section: Conclusion Discussion and Future Workmentioning
confidence: 99%
“…There have also been recent proposals for more domain-specific perceptual space learning methods that rely on a noisy top-down signal provided by knowledge of some word-like units (Kamper et al, 2015 ; Renshaw et al, 2015 ; Riad et al, 2018 ; Thiollière et al, 2015 ). These units can be found by searching for stretches of speech that form similar pairs or clusters, without any knowledge of phones (Jansen & Van Durme, 2011 ; McInnes & Goldwater, 2011 ; Park & Glass, 2008 ; Räsänen & Blandon, 2020 ). Assuming that the clusters represent different instances of the same word, the learner can then adjust its current representation of the low-level speech features to make these instances even closer together in perceptual space.…”
Section: Computational Approaches To Perceptual Space Learningmentioning
confidence: 99%
“…Two teams, indicated in Figure 2 as B [23] (JHU) and R [24] (Tampere), submitted two systems each. The edge of the grey region in Figure 2e shows the empirical tradeoff previously observed between having high quality matching (low NED) and exhaustively analysing the corpus (high coverage).…”
Section: Spoken Term Discovery and Segmentationmentioning
confidence: 99%