Rating scales are commonly used to study voice quality. However, recent research has demonstrated that perceptual measures of voice quality obtained using rating scales suffer from poor interjudge agreement and reliability, especially in the mid-range of the scale. These findings, along with those obtained using multidimensional scaling (MDS), have been interpreted to show that listeners perceive voice quality in an idiosyncratic manner. Based on psychometric theory, the present research explored an alternative explanation for the poor interlistener agreement observed in previous research. This approach suggests that poor agreement between listeners may result, in part, from measurement errors related to a variety of factors rather than true differences in the perception of voice quality. In this study, 10 listeners rated breathiness for 27 vowel stimuli using a 5-point rating scale. Each stimulus was presented to the listeners 10 times in random order. Interlistener agreement and reliability were calculated from these ratings. Agreement and reliability were observed to improve when multiple ratings of each stimulus from each listener were averaged and when standardized scores were used instead of absolute ratings. The probability of exact agreement was found to be approximately .9 when using averaged ratings and standardized scores. In contrast, the probability of exact agreement was only .4 when a single rating from each listener was used to measure agreement. These findings support the hypothesis that poor agreement reported in past research partly arises from errors in measurement rather than individual differences in the perception of voice quality.
While several acoustic measures have been proposed to quantify listener ratings of breathy voice quality, most have failed to give a consistent and high correlation with perceptual ratings of breathiness. One reason for these limitations is that most acoustic measures do not address the nonlinear processes that occur in the peripheral auditory system during the auditory perceptual process. It was hypothesized that modeling such nonlinear events during signal processing may provide objective parameters that better correspond to perceptual ratings of breathy voice quality. Ten listeners rated 27 voice stimuli using a five-point rating scale. Acoustic measures were determined from these stimuli and were selected based on their history of having a moderate to strong correlation to perceptual ratings of breathiness. The stimuli were also analyzed using an auditory model proposed by Moore, Glasberg, and Baer [J. Audio Eng. Soc. 45(4), 224-239 (1997)], and new measures were calculated from the output of this model. These measures included the partial loudness of the signal and the loudness of the aspiration noise. Measures obtained from the output of the auditory model were found to account for a high amount of variance in the perceptual ratings of breathiness.
Increased collaboration with neuroscientists working in clinical research centers addressing human communication disorders might foster research in this area. It is hoped that this article will encourage future research on speech motor control disorders to address the principles of neural plasticity and their application for rehabilitation.
The GRBAS scale is a widely used method for perceptual evaluation of voice quality. Two linguistically diverse groups of listeners (Japanese and American) rated 35 voice samples using the GRBAS scale. The ratings obtained from the two groups were compared to determine if the different linguistic background affected the use of the GRBAS scale. Results show that there are no significant differences between the Japanese and American listeners in the use of the Grade, Roughness and Breathiness scales. Ratings on the Asthenia and Strain scales, however, were different between the two groups of listeners. Despite these discrepancies, the GRBAS scale may be an excellent tool for perceptual evaluation of voice quality by linguistically diverse groups.
Objective
Experiments to study voice quality have typically used rating scales or direct magnitude estimation to obtain listener judgments. Unfortunately, the data obtained using these tasks is context-dependent, which makes it difficult to compare perceptual judgments of voice quality across experiments. The present experiment describes a simple matching task to quantify voice quality. The data obtained through this task was compared to perceptual judgments obtained using rating scale and direct magnitude estimation tasks to evaluate whether the three tasks provide equivalent perceptual distances across stimuli.
Methods
Ten synthetic vowel continua that varied in terms of their aspiration noise were evaluated for breathiness using each of the three tasks. Linear and nonlinear regression was used to compare the perceptual distances between stimuli obtained through each technique.
Results
Results show that the perceptual distances estimated from matching and direct magnitude estimation task are similar, but both differ from the rating scale task, suggesting that the matching task provides perceptual distances with ratio-level measurement properties.
Conclusions
The matching task is advantageous for measurement of vocal quality because it provides reliable measurement with ratio-level scale properties. It allows the use of a fixed reference signal for all comparisons, thus allowing researchers to directly compare findings across different experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.