2017
DOI: 10.1016/j.csl.2017.04.008
|View full text |Cite
|
Sign up to set email alerts
|

A segmental framework for fully-unsupervised large-vocabulary speech recognition

Abstract: Zero-resource speech technology is a growing research area that aims to develop methods for speech processing in the absence of transcriptions, lexicons, or language modelling text. Early term discovery systems focused on identifying isolated recurring patterns in a corpus, while more recent full-coverage systems attempt to completely segment and cluster the audio into word-like units-effectively performing unsupervised speech recognition. This article presents the first attempt we are aware of to apply such a… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

2
128
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 90 publications
(132 citation statements)
references
References 53 publications
(162 reference statements)
2
128
0
Order By: Relevance
“…A recently emerged area of speech technology research is the so-called zero-resource speech processing (ZS) initiative where the aim is to create systems capable of learning structural representations of speech input in the absence of any data labeling [1][2][3], providing both scalability towards under-resourced domains and illuminating how human infants may learn spoken languages. A number of the existing ZS systems, including the best performing system at the word-level [1] in the Interspeech-2015 Zerospeech challenge and the state-of-the-art system in [2] are based on clustering and temporal grouping of syllable-like rhythmic units.…”
Section: Introductionmentioning
confidence: 99%
See 4 more Smart Citations
“…A recently emerged area of speech technology research is the so-called zero-resource speech processing (ZS) initiative where the aim is to create systems capable of learning structural representations of speech input in the absence of any data labeling [1][2][3], providing both scalability towards under-resourced domains and illuminating how human infants may learn spoken languages. A number of the existing ZS systems, including the best performing system at the word-level [1] in the Interspeech-2015 Zerospeech challenge and the state-of-the-art system in [2] are based on clustering and temporal grouping of syllable-like rhythmic units.…”
Section: Introductionmentioning
confidence: 99%
“…A number of the existing ZS systems, including the best performing system at the word-level [1] in the Interspeech-2015 Zerospeech challenge and the state-of-the-art system in [2] are based on clustering and temporal grouping of syllable-like rhythmic units. The system in [1] first segments speech into syllable-like chunks, clusters the resulting tokens into categories using K-means, and decodes words as recurring n-grams over the syllabic clusters in the data.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations