Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-855
|View full text |Cite
|
Sign up to set email alerts
|

Eliciting Meaningful Units from Speech

Abstract: Elicitation of information structure from speech is a crucial step in automatic speech understanding. In terms of both production and perception, we consider intonational phrase to be the basic meaningful unit of information structure in speech. The current paper presents a method of detecting these units in speech by processing both the recorded speech and its textual representation. Using syntactic information, we split text into small groups of words closely connected with each other. Assuming that intonati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 18 publications
0
2
0
Order By: Relevance
“…Existing automated phrase boundary detection methods often utilize lexical and syntactic cues along with acoustic input (e.g., [38][39][40]. They usually involve extensive preparation steps such as manual tagging (e.g., [41,42]) and training a specific, designated model (e.g., [38,39,41,43,44]).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Existing automated phrase boundary detection methods often utilize lexical and syntactic cues along with acoustic input (e.g., [38][39][40]. They usually involve extensive preparation steps such as manual tagging (e.g., [41,42]) and training a specific, designated model (e.g., [38,39,41,43,44]).…”
Section: Introductionmentioning
confidence: 99%
“…They usually involve extensive preparation steps such as manual tagging (e.g., [41,42]) and training a specific, designated model (e.g., [38,39,41,43,44]). Approaches to speech segmentation based on acoustic signals alone were proposed in [45,46,40,47]. These efforts have been commonly applied to scripted speech (e.g.…”
Section: Introductionmentioning
confidence: 99%