Evaluation of Spectral Tilt Measures for Sentence Prominence Under Different Noise Conditions

Kakouros, Sofoklis; Räsänen, Okko; Alku, Paavo

doi:10.21437/interspeech.2017-1237

Cited by 13 publications

(12 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…All features except duration were computed using windows of 25 ms with a frame shift of 10 ms. Signal energy was computed from the time-domain signal according to Eq. (1) (where x is the input signal, w the length of the analysis window, t the current sample index, and τ the window shift; see, e.g., Kakouros & Räsänen, 2016); F0 was computed using the YAAPT pitch tracking algorithm (Zahorian & Hu, 2008); spectral tilt by computing the mel frequency cepstral coefficients and taking the first (C1) mel frequency cepstral coefficients (e.g., Kakouros, Räsänen, & Alku, 2017); and word duration was obtained from manual segmentations. All raw feature values were subsequently normalized:…”

Section: Acoustic Measures For Stimulus Propertiesmentioning

confidence: 99%

Cross-linguistic Influences on Sentence Accent Detection in Background Noise

et al. 2019

Self Cite

View full text Add to dashboard Cite

This paper investigates whether sentence accent detection in a non-native language is dependent on (relative) similarity between prosodic cues to accent between the non-native and the native language, and whether cross-linguistic differences in the use of local and more widely distributed (i.e., non-local) cues to sentence accent detection lead to differential effects of the presence of background noise on sentence accent detection in a non-native language. We compared Dutch, Finnish, and French non-native listeners of English, whose cueing and use of prosodic prominence is gradually further removed from English, and compared their results on a phoneme monitoring task in different levels of noise and a quiet condition to those of native listeners. Overall phoneme detection performance was high for the native and the non-native listeners, but deteriorated to the same extent in the presence of background noise. Crucially, relative similarity between the prosodic cues to sentence accent of one’s native language compared to that of a non-native language does not determine the ability to perceive and use sentence accent for speech perception in that non-native language. Moreover, proficiency in the non-native language is not a straightforward predictor of sentence accent perception performance, although high proficiency in a non-native language can seemingly overcome certain differences at the prosodic level between the native and non-native language. Instead, performance is determined by the extent to which listeners rely on local cues (English and Dutch) versus cues that are more distributed (Finnish and French), as more distributed cues survive the presence of background noise better.

show abstract

Section: Acoustic Measures For Stimulus Propertiesmentioning

confidence: 99%

Cross-linguistic Influences on Sentence Accent Detection in Background Noise

et al. 2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…For the computation, windows of 25 ms were used with a frame shift of 10 ms. Specifically, F0 was computed using YAAPT [25], spectral tilt by computing mel frequency cepstral coefficients (MFCCs) and by taking the first (C1) MFCC [26] [27], and word duration was obtained from manual segmentations. Following the computation of the raw feature values: (i) energy was logarithmically normalised, (ii) F0 was semitone normalised relative to the minimum F0 in each utterance, and (iii) tilt was exponentially normalisedin this case, the exponential function provides a near linear scaling of the tilt estimates to positive real numbers for ease of interpretation.…”

Section: Feature Extractionmentioning

confidence: 99%

Sentence Accent Perception in Noise by French Non-Native Listeners of English

Scharenborg¹,

Meunier²,

Kakouros³

et al. 2018

Speech Prosody 2018

Self Cite

View full text Add to dashboard Cite

This paper investigates the use of prosodic information signalling sentence accent and the role of different acoustic features on sentence accent perception during native and nonnative speech perception in the presence of background noise. A phoneme detection experiment was carried out in which English native listeners and French highly proficient non-native listeners of English were presented with target phonemes in English sentences. Sentences were presented in different levels of speech-shaped noise and in two prosodic contexts in which the target-bearing word was either deaccented or accented. Acoustic analyses of the two prosodic conditions showed that the target-bearing words in the accented condition carried more energy, had a higher F0, and more spectral tilt than those in the deaccented condition. Results of the behavioural data showed that the native listeners outperformed the French listeners in the clean condition but not in the noise conditions and that the effect of noise was smaller for the non-native compared to the native listeners. Possibly, the non-native listeners use more and different acoustic cues than the native listeners who primarily relied on more local cues for sentence accent detection.

show abstract

“…And in both the cases, power spectral density will be more at low frequency when compared to high frequency. This nature is called as spectral tilt [12]. This parameter is of significance in the recognition system of ALT speech because most of the utterance is hyper in nature involving more power.…”

Section: Spectral Tilt Estimationmentioning

confidence: 99%

ALT Speech Recognition System using F0 Improvement and Spectral Tilt Method

Inbanila¹,

E²

2019

IJEAT

View full text Add to dashboard Cite

Human Beings use voice as the medium for communication. Human Speech is a very complex signal with multiple frequencies, amplitudes and intensities that mix up to convey specific information. In international terminology, voice disorders are described as dysphonia. Various dysphonia’s are clearly organic origin due to nervous, muscular, neuro or cellular degenerative disease affecting the body or it is from local laryngeal changes. Other dysphonia’s having no visible laryngeal causes are grouped as non organic involving habitual dysphonia’s that arise from faulty speaking habits or the psycho genic dysphonia’s that stem from emotional causes. This paper looks at a speech recognition system for disordered speech generated by Physically Disabled people using Artificial Larynx Transducer (ALT) device from the perspective of Speech Signal Processing. From the ALT speech features like formant, pitch and spectral tilt is estimated. For formant frequency estimation RNN technique is used. Before training the system pitch frequency improvement is accomplished. Now the features and homomorphic based coefficients are used for training the system. The same operation is performed during the test phase and compared with the training set. Comparison and decision making is accomplished using distance estimator.

show abstract

Evaluation of Spectral Tilt Measures for Sentence Prominence Under Different Noise Conditions

Cited by 13 publications

References 27 publications

Cross-linguistic Influences on Sentence Accent Detection in Background Noise

Cross-linguistic Influences on Sentence Accent Detection in Background Noise

Sentence Accent Perception in Noise by French Non-Native Listeners of English

ALT Speech Recognition System using F0 Improvement and Spectral Tilt Method

Contact Info

Product

Resources

About