Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-1237
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of Spectral Tilt Measures for Sentence Prominence Under Different Noise Conditions

Abstract: Spectral tilt has been suggested to be a correlate of prominence in speech, although several studies have not replicated this empirically. This may be partially due to the lack of a standard method for tilt estimation from speech, rendering interpretations and comparisons between studies difficult. In addition, little is known about the performance of tilt estimators for prominence detection in the presence of noise. In this work, we investigate and compare several standard tilt measures on quantifying promine… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0
1

Year Published

2017
2017
2022
2022

Publication Types

Select...
6
3

Relationship

4
5

Authors

Journals

citations
Cited by 13 publications
(12 citation statements)
references
References 27 publications
0
11
0
1
Order By: Relevance
“…All features except duration were computed using windows of 25 ms with a frame shift of 10 ms. Signal energy was computed from the time-domain signal according to Eq. (1) (where x is the input signal, w the length of the analysis window, t the current sample index, and τ the window shift; see, e.g., Kakouros & Räsänen, 2016); F0 was computed using the YAAPT pitch tracking algorithm (Zahorian & Hu, 2008); spectral tilt by computing the mel frequency cepstral coefficients and taking the first (C1) mel frequency cepstral coefficients (e.g., Kakouros, Räsänen, & Alku, 2017); and word duration was obtained from manual segmentations. All raw feature values were subsequently normalized:…”
Section: Acoustic Measures For Stimulus Propertiesmentioning
confidence: 99%
“…All features except duration were computed using windows of 25 ms with a frame shift of 10 ms. Signal energy was computed from the time-domain signal according to Eq. (1) (where x is the input signal, w the length of the analysis window, t the current sample index, and τ the window shift; see, e.g., Kakouros & Räsänen, 2016); F0 was computed using the YAAPT pitch tracking algorithm (Zahorian & Hu, 2008); spectral tilt by computing the mel frequency cepstral coefficients and taking the first (C1) mel frequency cepstral coefficients (e.g., Kakouros, Räsänen, & Alku, 2017); and word duration was obtained from manual segmentations. All raw feature values were subsequently normalized:…”
Section: Acoustic Measures For Stimulus Propertiesmentioning
confidence: 99%
“…For the computation, windows of 25 ms were used with a frame shift of 10 ms. Specifically, F0 was computed using YAAPT [25], spectral tilt by computing mel frequency cepstral coefficients (MFCCs) and by taking the first (C1) MFCC [26] [27], and word duration was obtained from manual segmentations. Following the computation of the raw feature values: (i) energy was logarithmically normalised, (ii) F0 was semitone normalised relative to the minimum F0 in each utterance, and (iii) tilt was exponentially normalisedin this case, the exponential function provides a near linear scaling of the tilt estimates to positive real numbers for ease of interpretation.…”
Section: Feature Extractionmentioning
confidence: 99%
“…And in both the cases, power spectral density will be more at low frequency when compared to high frequency. This nature is called as spectral tilt [12]. This parameter is of significance in the recognition system of ALT speech because most of the utterance is hyper in nature involving more power.…”
Section: Spectral Tilt Estimationmentioning
confidence: 99%