Automatic estimation of voice onset time for word-initial stops by applying random forest to onset detection

Lin, Chi-Yueh; Wang, Hsiao-Chuan

doi:10.1121/1.3592233

Cited by 21 publications

(19 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The algorithm's performance can then be reported either as the full (empirical) CDF of automatic/manual Lin and Wang, 2011). Reporting statistics about the CDF of automatic/manual differences is a standard evaluation method in ASR tasks, such as forced alignment of phoneme sequences, where the goal is to predict the location of boundaries in a speech segment (e.g., Brugnara et al, 1993;Keshet et al, 2007).…”

Section: Distribution Of Automatic/manual Differencementioning

confidence: 99%

“…Previous work has used automatic measurements for speech recognition tasks Ramesh, 1998, 2003;Ali, 1999;Stouten and van Hamme, 2009), phonetic measurement (Fowler et al, 2008;Tauberer, 2010), and accented speech detection (Kazemzadeh et al, 2006;Hansen et al, 2010). Some studies, like ours, focus largely on the problem of VOT measurement itself, and evaluate the proposed algorithm by comparing automatic and manual measurements (Stouten and van Hamme, 2009;Yao, 2009;Hansen et al, 2010;Lin and Wang, 2011). Our approach differs from all previous studies except one (Lin and Wang, 2011) in an important aspect.…”

Section: Introductionmentioning

confidence: 98%

“…Some studies, like ours, focus largely on the problem of VOT measurement itself, and evaluate the proposed algorithm by comparing automatic and manual measurements (Stouten and van Hamme, 2009;Yao, 2009;Hansen et al, 2010;Lin and Wang, 2011). Our approach differs from all previous studies except one (Lin and Wang, 2011) in an important aspect. Instead of using a set of customized rules to estimate VOT, our system learns to estimate VOT from training data.…”

Section: Introductionmentioning

confidence: 98%

See 2 more Smart Citations

Automatic measurement of voice onset time using discriminative structured prediction

Sonderegger

Keshet

2012

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

A discriminative large-margin algorithm for automatic measurement of voice onset time (VOT) is described, considered as a case of predicting structured output from speech. Manually labeled data are used to train a function that takes as input a speech segment of an arbitrary length containing a voiceless stop, and outputs its VOT. The function is explicitly trained to minimize the difference between predicted and manually measured VOT; it operates on a set of acoustic feature functions designed based on spectral and temporal cues used by human VOT annotators. The algorithm is applied to initial voiceless stops from four corpora, representing different types of speech. Using several evaluation methods, the algorithm's performance is near human intertranscriber reliability, and compares favorably with previous work. Furthermore, the algorithm's performance is minimally affected by training and testing on different corpora, and remains essentially constant as the amount of training data is reduced to 50-250 manually labeled examples, demonstrating the method's practical applicability to new datasets.

show abstract

Section: Distribution Of Automatic/manual Differencementioning

confidence: 99%

Section: Introductionmentioning

confidence: 98%

Section: Introductionmentioning

confidence: 98%

See 1 more Smart Citation

Automatic measurement of voice onset time using discriminative structured prediction

Sonderegger

Keshet

2012

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

show abstract

“…and Van Hamme (RS), 5 the random-forest-based method by Lin and Wang (RF) 9 and structured-prediction-based method by Sonderegger and Keshet (SP). 8 All of these studies report results on the TIMIT database using the same validation criterion.…”

Section: Resultsmentioning

confidence: 99%

“…Methods for the measurement of VOT fall into two categories: (a) those which explicitly identify the locations of the burst and voicing onsets through a set of customized acoustic-phonetic rules (knowledge-based), 4,6 and (b) those which train a learning machine (such as random forest, support vector machine) to estimate the VOT using some acoustic features corresponding to the stop-to-voiced-phone transition event. 8,9 Many of the high performing methods require phonetic transcription either to identify the segment of the speech signal containing the stop consonant through forced-alignment 4,9 or to focus the analysis on segments of the signal containing only one stop consonant. 8 Such methods are difficult to employ in a scenario where there is no transcription available.…”

Section: Motivationmentioning

confidence: 99%

Estimation of voice-onset time in continuous speech using temporal measures

Prathosh

Ramakrishnan

Ananthapadmanabha³

2014

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

This paper proposes an automatic acoustic-phonetic method for estimating voice-onset time of stops. This method requires neither transcription of the utterance nor training of a classifier. It makes use of the plosion index for the automatic detection of burst onsets of stops. Having detected the burst onset, the onset of the voicing following the burst is detected using the epochal information and a temporal measure named the maximum weighted inner product. For validation, several experiments are carried out on the entire TIMIT database and two of the CMU Arctic corpora. The performance of the proposed method compares well with three state-of-the-art techniques.

show abstract

Mechanisms for Profiling

Singh

2019

Profiling Humans From Their Voice

View full text Add to dashboard Cite

Automatic estimation of voice onset time for word-initial stops by applying random forest to onset detection

Cited by 21 publications

References 35 publications

Automatic measurement of voice onset time using discriminative structured prediction

Automatic measurement of voice onset time using discriminative structured prediction

Estimation of voice-onset time in continuous speech using temporal measures

Mechanisms for Profiling

Contact Info

Product

Resources

About