2013 IEEE Workshop on Automatic Speech Recognition and Understanding 2013
DOI: 10.1109/asru.2013.6707761
|View full text |Cite
|
Sign up to set email alerts
|

A hierarchical system for word discovery exploiting DTW-based initialization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
60
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 30 publications
(60 citation statements)
references
References 12 publications
0
60
0
Order By: Relevance
“…While useful, the discovered patterns are typically isolated segments spread out over the data, leaving much speech as background. This has prompted several studies on full-coverage approaches, where the entire speech input is segmented and clustered into word-like units [17][18][19][20][21].…”
Section: Introductionmentioning
confidence: 99%
“…While useful, the discovered patterns are typically isolated segments spread out over the data, leaving much speech as background. This has prompted several studies on full-coverage approaches, where the entire speech input is segmented and clustered into word-like units [17][18][19][20][21].…”
Section: Introductionmentioning
confidence: 99%
“…Figure 6 shows an example of a two level hierarchical representation of a speech signal. On the first hierarchical level the aim is to discover the acoustic building blocks of speech, the phonemes, and to learn a statistical model for each of them, the acoustic model [11,56,53,47]. In speech recognition, the acoustic model usually consists of Hidden Markov Models (HMMs), where each HMM emits a time series of vectors of cepstral coefficients.…”
Section: Representation Learning From Sequential Datamentioning
confidence: 99%
“…In the related task of acoustic pattern discovery, DTW can be allowed to consider multiple local alignments between speech signals during the overall search [8]. In this way DTW can find similar segment pairs in speech audio, followed by a clustering step [9]. The resulting cluster labels are used to train hidden Markov models (HMMs).…”
Section: Introductionmentioning
confidence: 99%