2016
DOI: 10.1016/j.specom.2016.06.004
|View full text |Cite
|
Sign up to set email alerts
|

3PRO – An unsupervised method for the automatic detection of sentence prominence in speech

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
22
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
5
3
1

Relationship

3
6

Authors

Journals

citations
Cited by 26 publications
(22 citation statements)
references
References 50 publications
0
22
0
Order By: Relevance
“…For the unsupervised system and for each fold, five orders of the acoustic n-gram models (n = 1, 2, 3, 4, and 5) were trained for energy, F0, and spectral tilt, on speech data from 28 speakers, always keeping 1 (out of 10) of the annotated speakers for evaluation. In order to evaluate performance for different threshold levels, hyperparameter λ was varied between [-2, 2] with steps of 0.05 for the lexical, acoustic, and combined models (see [10,35] for examples on the effect of λ). Table 1 presents the results for the independent features and the most relevant combinations, as well as the combined model (acoustic+lexical) performance.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…For the unsupervised system and for each fold, five orders of the acoustic n-gram models (n = 1, 2, 3, 4, and 5) were trained for energy, F0, and spectral tilt, on speech data from 28 speakers, always keeping 1 (out of 10) of the annotated speakers for evaluation. In order to evaluate performance for different threshold levels, hyperparameter λ was varied between [-2, 2] with steps of 0.05 for the lexical, acoustic, and combined models (see [10,35] for examples on the effect of λ). Table 1 presents the results for the independent features and the most relevant combinations, as well as the combined model (acoustic+lexical) performance.…”
Section: Resultsmentioning
confidence: 99%
“…For KNN, the number of nearest neighbors was set to k = 13 since this provided the most consistent performance in preliminary testing. SVMs used radial basis function with a scale factor of σ = 12.08 and box constraint C = 100, as these were previously optimized in the context of acoustic features with the same data [35].…”
Section: Supervised Classificationmentioning
confidence: 99%
“…In support of the acoustic predictability hypothesis, we have earlier shown that unpredictability of acoustic prosodic trajectories is highly correlated with human judgments of prominence on the same input in several different languages (Kakouros & Räsänen, 2014, 2016a, 2016b and that both prosodic and lexical predictability contribute to the prominence of words (Kakouros, Pelemans, Vervimp, Wambacq & Räsänen, 2016), effectively replacing the signal-based component of Cole et al (2010) with a fully probabilistic model at multiple levels of representation. The hypothesis also receives support from electrophysiological studies.…”
Section: Introductionmentioning
confidence: 84%
“…Specifically, the manually labeled prominence markings were used to divide the data into two categories: prominent and non-prominent words. As the data have been labeled by two annotators, all words with at least one prominence marking were considered as prominent (see [32] for a similar approach). For the evaluation, five wordlevel statistical descriptors were computed for all measured features: (i) mean, (ii) max, (iii) min, (iv) standard deviation (SD), and (v) the feature range during the word.…”
Section: Discussionmentioning
confidence: 99%