Abstract:An exploratory implementation of a syllable-based recognizer is described. Continuous speech is first divided into syllabic unite, and the units are then matched against syllable templates using a dynamic programming algorithm. A hierarchical transition network is used to limit the syllable search to possible continuations of the current partial sentence hypotheses. Competing hypotheses are pruned by a 'beam search'.Experiments are reported on automatic recognition of English sentences with a 70-word vocabular… Show more
“…The experiments in Zwicker et al [1979] automatically segmented speech by half syllables by detecting both syllable boundaries and peaks of nuclei. Speech was automatically segmented into whole syllables using onset estimations in Hunt et al [1980]. All of these methods made hard decisions about syllable locations.…”
Section: Syllable Based Segmentations For Asrmentioning
confidence: 99%
“…This work proposed that speech should be characterized in terms of syllable onset, nucleus, and coda along with sub-classes within each of these sub-syllable units. This was actually deployed in a dynamic template matching system in Zwicker et al [1979] where template matching was used to recognize half-syllable units, and in Hunt et al [1980] where a similar approach was used on whole-syllable units. Syllables were employed as the basic recognition unit in an HMM based system in Green et al [1993].…”
Section: Syllables As a Recognition Unitmentioning
“…The experiments in Zwicker et al [1979] automatically segmented speech by half syllables by detecting both syllable boundaries and peaks of nuclei. Speech was automatically segmented into whole syllables using onset estimations in Hunt et al [1980]. All of these methods made hard decisions about syllable locations.…”
Section: Syllable Based Segmentations For Asrmentioning
confidence: 99%
“…This work proposed that speech should be characterized in terms of syllable onset, nucleus, and coda along with sub-classes within each of these sub-syllable units. This was actually deployed in a dynamic template matching system in Zwicker et al [1979] where template matching was used to recognize half-syllable units, and in Hunt et al [1980] where a similar approach was used on whole-syllable units. Syllables were employed as the basic recognition unit in an HMM based system in Green et al [1993].…”
Section: Syllables As a Recognition Unitmentioning
“…Note that since this is based only on a count, there is flexibility in the exact placement in time of the syllable centers. Thus, the location information is used, but it does not need to be as precise as methods that segment the utterance based on syllable onsets [6,8]. The motivation for this is that the exact placement of the onset may not be well defined due to coarticulation [19].…”
Section: Oracle Experimentsmentioning
confidence: 99%
“…The syllable was proposed as a basic unit of recognition as early as 1975 [5]. In [6], the utterances were segmented via syllable onset estimations as a precursor to template matching, and in [7] syllables were employed as the basic recognition unit in an HMM . The most closely related method to this paper was presented by Wu in 1997 [8,9].…”
This work presents the use of dynamic Bayesian networks (DBNs) to jointly estimate word position and word identity in an automatic speech recognition system. In particular, we have augmented a standard Hidden Markov Model (HMM) with counts and locations of syllable nuclei. Three experiments are presented here. The first uses oracle syllable counts, the second uses oracle syllable nuclei locations, and the third uses estimated (non-oracle) syllable nuclei locations. All results are presented on the 10 and 500 word tasks of the SVitchboard corpus. The oracle experiments give relative improvements ranging from 7.0% to 37.2%. When using estimated syllable nuclei a relative improvement of 3.1% is obtained on the 10 word task.
“…Dentre os exemplos de características extraídas a partir do resultado da aplicação desses métodos estão as características baseadas na transformada STFT, descritas em [Wold et al, 1996, Tzanetakis, 2002, e os Mel Frequency Cepstral Coefficients (MFCC) [Hunt et al, 1980]. Esse tipo de características pode ser utilizado para capturar aspectos relacionados ao timbre de um sinal deáudio e pode ser empregado tanto em aplicações de análise da fala como em aplicações de análise de músicas [Tzanetakis, 2002].…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.