The generalized power spectrum model [GPSM; Biberger and Ewert (2016). J. Acoust. Soc. Am. 140, 1023-1038], combining the "classical" concept of the power-spectrum model (PSM) and the envelope power spectrum-model (EPSM), was demonstrated to account for several psychoacoustic and speech intelligibility (SI) experiments. The PSM path of the model uses long-time power signal-to-noise ratios (SNRs), while the EPSM path uses short-time envelope power SNRs. A systematic comparison of existing SI models for several spectro-temporal manipulations of speech maskers and gender combinations of target and masker speakers [Schubotz et al. (2016). J. Acoust. Soc. Am. 140, 524-540] showed the importance of short-time power features. Conversely, Jørgensen et al. [(2013). J. Acoust. Soc. Am. 134, 436-446] demonstrated a higher predictive power of short-time envelope power SNRs than power SNRs using reverberation and spectral subtraction. Here the GPSM was extended to utilize short-time power SNRs and was shown to account for all psychoacoustic and SI data of the three mentioned studies. The best processing strategy was to exclusively use either power or envelope-power SNRs, depending on the experimental task. By analyzing both domains, the suggested model might provide a useful tool for clarifying the contribution of amplitude modulation masking and energetic masking.
Human auditory perception and speech intelligibility have been successfully described based on the two concepts of spectral masking and amplitude modulation (AM) masking. The power-spectrum model (PSM) [Patterson and Moore (1986). Frequency Selectivity in Hearing, pp. 123-177] accounts for effects of spectral masking and critical bandwidth, while the envelope power-spectrum model (EPSM) [Ewert and Dau (2000). J. Acoust. Soc. Am. 108, 1181-1196] has been successfully applied to AM masking and discrimination. Both models extract the long-term (envelope) power to calculate signal-to-noise ratios (SNR). Recently, the EPSM has been applied to speech intelligibility (SI) considering the short-term envelope SNR on various time scales (multi-resolution speech-based envelope power-spectrum model; mr-sEPSM) to account for SI in fluctuating noise [Jørgensen, Ewert, and Dau (2013). J. Acoust. Soc. Am. 134, 436-446]. Here, a generalized auditory model is suggested combining the classical PSM and the mr-sEPSM to jointly account for psychoacoustics and speech intelligibility. The model was extended to consider the local AM depth in conditions with slowly varying signal levels, and the relative role of long-term and short-term SNR was assessed. The suggested generalized power-spectrum model is shown to account for a large variety of psychoacoustic data and to predict speech intelligibility in various types of background noise.
Smart headphones or hearables use different types of algorithms such as noise cancelation, feedback suppression, and sound pressure equalization to eliminate undesired sound sources or to achieve acoustical transparency. Such signal processing strategies might alter the spectral composition or interaural differences of the original sound, which might be perceived by listeners as monaural or binaural distortions and thus degrade audio quality. To evaluate the perceptual impact of these distortions, subjective quality ratings can be used, but these are time consuming and costly. Auditory-inspired instrumental quality measures can be applied with less effort and may also be helpful in identifying whether the distortions impair the auditory representation of monaural or binaural cues. Therefore, the goals of this study were (a) to assess the applicability of various monaural and binaural audio quality models to distortions typically occurring in hearables and (b) to examine the effect of those distortions on the auditory representation of spectral, temporal, and binaural cues. Results showed that the signal processing algorithms considered in this study mainly impaired (monaural) spectral cues. Consequently, monaural audio quality models that capture spectral distortions achieved the best prediction performance. A recent audio quality model that predicts monaural and binaural aspects of quality was revised based on parts of the current data involving binaural audio quality aspects, leading to improved overall performance indicated by a mean Pearson linear correlation of 0.89 between obtained and predicted ratings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.