Eliciting Meaningful Units from Speech

Kocharov, Daniil; Kachkovskaia, Tatiana; Skrelin, Pavel A.

doi:10.21437/interspeech.2017-855

Cited by 1 publication

(2 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Existing automated phrase boundary detection methods often utilize lexical and syntactic cues along with acoustic input (e.g., [38][39][40]. They usually involve extensive preparation steps such as manual tagging (e.g., [41,42]) and training a specific, designated model (e.g., [38,39,41,43,44]).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Automatic detection of prosodic boundaries in spontaneous speech

et al. 2021

View full text Add to dashboard Cite

Automatic speech recognition (ASR) and natural language processing (NLP) are expected to benefit from an effective, simple, and reliable method to automatically parse conversational speech. The ability to parse conversational speech depends crucially on the ability to identify boundaries between prosodic phrases. This is done naturally by the human ear, yet has proved surprisingly difficult to achieve reliably and simply in an automatic manner. Efforts to date have focused on detecting phrase boundaries using a variety of linguistic and acoustic cues. We propose a method which does not require model training and utilizes two prosodic cues that are based on ASR output. Boundaries are identified using discontinuities in speech rate (pre-boundary lengthening and phrase-initial acceleration) and silent pauses. The resulting phrases preserve syntactic validity, exhibit pitch reset, and compare well with manual tagging of prosodic boundaries. Collectively, our findings support the notion of prosodic phrases that represent coherent patterns across textual and acoustic parameters.

show abstract

Section: Introductionmentioning

confidence: 99%

“…They usually involve extensive preparation steps such as manual tagging (e.g., [41,42]) and training a specific, designated model (e.g., [38,39,41,43,44]). Approaches to speech segmentation based on acoustic signals alone were proposed in [45,46,40,47]. These efforts have been commonly applied to scripted speech (e.g.…”

Section: Introductionmentioning

confidence: 99%