2018 IEEE Spoken Language Technology Workshop (SLT) 2018
DOI: 10.1109/slt.2018.8639515
|View full text |Cite
|
Sign up to set email alerts
|

A K-Nearest Neighbours Approach To Unsupervised Spoken Term Discovery

Abstract: Unsupervised spoken term discovery is the task of finding recurrent acoustic patterns in speech without any annotations. Current approaches consists of two steps: (1) discovering similar patterns in speech, and (2) partitioning those pairs of acoustic tokens using graph clustering methods. We propose a new approach for the first step. Previous systems used various approximation algorithms to make the search tractable on large amounts of data. Our approach is based on an optimized k-nearest neighbours (KNN) sea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
3
1

Relationship

2
5

Authors

Journals

citations
Cited by 9 publications
(12 citation statements)
references
References 20 publications
0
9
0
Order By: Relevance
“…Finding positive pairs of speech sequences is an area of research called unsupervised term discovery (UTD) [16][17][18]28]. Such UTD systems can be DTW alignment based [16] or involve a k-Nearest-Neighbours search [28]. We opted for the latter, as it is both scalable and among the state-of-the-art methods.…”
Section: Finding and Choosing Pairs Of Speech Embeddingsmentioning
confidence: 99%
See 3 more Smart Citations
“…Finding positive pairs of speech sequences is an area of research called unsupervised term discovery (UTD) [16][17][18]28]. Such UTD systems can be DTW alignment based [16] or involve a k-Nearest-Neighbours search [28]. We opted for the latter, as it is both scalable and among the state-of-the-art methods.…”
Section: Finding and Choosing Pairs Of Speech Embeddingsmentioning
confidence: 99%
“…We opted for the latter, as it is both scalable and among the state-of-the-art methods. It encodes exhaustively all possible speech sequences with an embedding model, and used optimised k-NN search [29] to retrieve acoustically similar pairs of speech sequences (see the details in [28]). In our experiments, we used the pairs retrieved by k-NN on GD-PLP encoded sequences to train our self-supervised models (CAE,Siamese, CAE-Siamese).…”
Section: Finding and Choosing Pairs Of Speech Embeddingsmentioning
confidence: 99%
See 2 more Smart Citations
“…The model is a convolution and transformer-based embedder trained with the NTXEnt contrastive loss [22]. Building on similar ideas in vision and speech, we select our positive examples through a mix of time-stretching data augmentation [23] and k-Nearerst Neighbors search [24,25]. Figure 1 gives an overview of our method.…”
Section: Introductionmentioning
confidence: 99%