Guus de Krom scite author profile

A new method to calculate a spectral harmonics-to-noise ratio (HNR) in speech signals is presented. The method involves discrimination between harmonic and noise energy in the magnitude spectrum by means of a comb-liftering operation in the cepstrum domain. Sensitivity of HNR to (a) additive noise and (b) jitter was tested with synthetic vowel-like signals, generated at 10 fundamental frequencies. All jitter and noise signals were analyzed at three window lengths in order to investigate the effect of the length of the analysis frame on the estimated HNR values. Results of a multiple linear regression analysis with noise or jitter, F 0 , and window length as predictors for HNR indicate a major effect of both noise and jitter on HNR, in that HNR decreases almost linearly with increasing noise levels or increasing jitter. The influence of F 0 and window length on HNR is small for the jittered signals, but HNR increases considerably with increasing F 0 or window length for the noise signals. We conclude that the method seems to be a valid technique for determining the amount of spectral noise, because it is almost linearly sensitive to both noise and jitter for a large part of the noise or jitter continuum. The strong negative relation between HNR and jitter illustrates that spectral noise measures cannot simply be taken as indicators of the actual amount of noise in the time signal. Instead, HNR integrates several aspects of the acoustic stability of the signal. As such, HNR may be a useful parameter in the analysis of voice quality, although it cannot be directly interpreted in terms of underlying glottal events or perceptual characteristics.

show abstract

Some Spectral Correlates of Pathological Breathy and Rough Voice Quality for Different Types of Vowel Fragments

Krom¹

1995

J Speech Lang Hear Res

182

113

View full text Add to dashboard Cite

This study deals with the relation between listeners' ratings of pathological breathiness and roughness and certain characteristics of the voice spectrum. Two general research questions were addressed: First, which spectral parameters may serve as useful predictors of breathiness and roughness? Second, does the type of speech fragment used for analysis have an effect on the obtained regression model? Listener ratings of breathiness and roughness were obtained for three types of vowel fragments: a vowel onset segment, a mid-vowel (post-onset) segment, and a vowel segment covering the onset and the acoustically more stable post-onset parts. Results indicated that the harmonics-to-noise ratio was the best single predictor of both rated breathiness and roughness, explaining up to 54% of the true rating variance. By combining different predictors, between 75% and 80% of the breathiness variance could be explained for all three types of fragments. For roughness, a strong effect of fragment type was observed, with most variance explained in vowel onset fragments (71%), and least in post-onset fragments (52%). The effect of fragment type was also observed when regression analyses were performed with six predictors based on a factor analysis of the acoustic data.

show abstract

Consistency and Reliability of Voice Quality Ratings for Different Types of Speech Fragments

Krom¹

1994

J Speech Lang Hear Res

110

View full text Add to dashboard Cite

This study describes a perception experiment in which listeners were asked to rate voice fragments obtained from a variety of speakers on grade, breathiness, and roughness. Four different types of stimuli were presented to each listener. One type of stimulus was based on connected speech fragments; the other three were based on different segments of a sustained vowel, yielding a 200 msec vowel onset stimulus, a 200 msec post-onset stimulus, and a 1000 msec whole vowel stimulus. Analyses focused on the consistency and reliability of grade, roughness, and breathiness ratings. Results indicated that stimulus type had virtually no effect on either within- or between-listener consistency of the grade, breathiness, or roughness ratings. Rating reliability too was hardly influenced by stimulus type. When determined as a function of the overall degree of deviance of a voice, the reliability of breathiness and roughness ratings was slightly higher for whole vowel and vowel onset stimuli than for connected speech and post-onset stimuli. It is concluded that connected speech stimuli are not necessarily to be preferred over vowel-type stimuli for a perceptual evaluation of grade, roughness, or breathiness. The somewhat higher reliability of ratings on vowel onset and whole vowel stimuli as compared to the post-onset stimuli is taken as an indication that the onset part of a vowel may contain voice quality cues that are less salient in the most stable part of a vowel.

show abstract

“Pitch” Accent in Alaryngeal Speech

Rossum

Krom

Nooteboom

et al. 2002

J Speech Lang Hear Res

View full text Add to dashboard Cite

Highly proficient alaryngeal speakers are known to convey prosody successfully. The present study investigated whether alaryngeal speakers not selected on grounds of proficiency were able to convey pitch accent (a pitch accent is realized on the word that is in focus, cf. Bolinger, 1958). The participating speakers (10 tracheoesophageal, 9 esophageal, and 10 laryngeal [control] speakers) produced sentences in which accent was cued by the preceding context. For each utterance, a group of listeners identified which word conveyed accent. All speakers were able to convey accent. Acoustic analyses showed that some alaryngeal speakers had little or no control over fundamental frequency. Contrary to expectation, these speakers did not compensate by using nonmelodic cues, whereas speakers using F0 did use nonmelodic cues. Thus, temporal and intensity cues are concomitant with the use of F0; if F0 is affected, these nonmelodic cues will be as well. A pitch perception experiment confirmed that alaryngeal speakers who had no control over F0 and who did not use nonmelodic cues were nevertheless able to produce pitch movements. Speakers with no control over F0 apparently relied on an alternative pitch system to convey accents and other pitch movements.

show abstract

Perception and acoustics of emotions in singing

Jansens¹,

Bloothooft²,

Krom³

1997

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Guus de Krom

A Cepstrum-Based Technique for Determining a Harmonics-to-Noise Ratio in Speech Signals

Some Spectral Correlates of Pathological Breathy and Rough Voice Quality for Different Types of Vowel Fragments

Consistency and Reliability of Voice Quality Ratings for Different Types of Speech Fragments

“Pitch” Accent in Alaryngeal Speech

Perception and acoustics of emotions in singing

Contact Info

Product

Resources

About