Abstract:Speech intelligibility prediction of noisy and processed noisy speech is important in a number of application domains such as hearing instruments and forensics. Most available objective intelligibility measures employ either a signal-to-noise ratio (SNR)-based or correlation-based comparison between frequency bands of the clean and the processed speech. In this paper, we approach the speech intelligibility prediction from the angle of information theory and show that an information theoretic concept provides a… Show more
“…The metric proposed in [23], which estimates mutual information using a k-nearest neighbor estimator, achieved comparative results to STOI in one of the tested performance measures and marginally worse results in the other. The metric in [14], which is computed from the lower bounds of mutual information, achieved a performance approximately equal to that of STOI.…”
Section: Introductionmentioning
confidence: 99%
“…More recently, metrics based on the mutual information between the spectral envelopes of the clean and degraded signal have been proposed [14,23]. The metric proposed in [23], which estimates mutual information using a k-nearest neighbor estimator, achieved comparative results to STOI in one of the tested performance measures and marginally worse results in the other.…”
Section: Introductionmentioning
confidence: 99%
“…To obtain the final metric in [22,14,23], the intermediate intelligibility measures computed in different TimeFrequency (TF) regions are averaged using uniform weights. However, it is known that not all portions of a speech signal contain equal quantities of the information required for intelligibility.…”
It is known that the information required for the intelligibility of a speech signal is distributed non-uniformly in time. In this paper we propose WSTOI, a modified version of STOI, a speech intelligibility metric. With WSTOI the contribution of each time-frequency cell is weighted by an estimate of its intelligibility content. This estimate is equal to the mutual information between two hypothetical signals at either end of a simplified model of human communication. Listening tests show that the modification improves the prediction accuracy of STOI at all performance levels on both long and short utterances. An improvement was observed across all tested noise types and suppression algorithms.
“…The metric proposed in [23], which estimates mutual information using a k-nearest neighbor estimator, achieved comparative results to STOI in one of the tested performance measures and marginally worse results in the other. The metric in [14], which is computed from the lower bounds of mutual information, achieved a performance approximately equal to that of STOI.…”
Section: Introductionmentioning
confidence: 99%
“…More recently, metrics based on the mutual information between the spectral envelopes of the clean and degraded signal have been proposed [14,23]. The metric proposed in [23], which estimates mutual information using a k-nearest neighbor estimator, achieved comparative results to STOI in one of the tested performance measures and marginally worse results in the other.…”
Section: Introductionmentioning
confidence: 99%
“…To obtain the final metric in [22,14,23], the intermediate intelligibility measures computed in different TimeFrequency (TF) regions are averaged using uniform weights. However, it is known that not all portions of a speech signal contain equal quantities of the information required for intelligibility.…”
It is known that the information required for the intelligibility of a speech signal is distributed non-uniformly in time. In this paper we propose WSTOI, a modified version of STOI, a speech intelligibility metric. With WSTOI the contribution of each time-frequency cell is weighted by an estimate of its intelligibility content. This estimate is equal to the mutual information between two hypothetical signals at either end of a simplified model of human communication. Listening tests show that the modification improves the prediction accuracy of STOI at all performance levels on both long and short utterances. An improvement was observed across all tested noise types and suppression algorithms.
“…Date of publication September 16, 2013; date of current version November 13, 2013. This paper is an extended version of [1] presented at ICASSP2012, and [2] presented at the ITG Speech Communication Confer-ence2012. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Wai-Yip Geoffrey Chan.…”
We propose a novel method for objective speech intelligibility prediction which can be useful in many application domains such as hearing instruments and forensics. Most objective intelligibility measures available in the literature employ some kind of signal-to-noise ratio (SNR) or a correlation-based comparison between the spectro-temporal representations of clean and processed speech. In this paper, we investigate the speech intelligibility prediction from the viewpoint of information theory and introduce novel objective intelligibility measures based on the estimated mutual information between the temporal envelopes of clean speech and processed speech in the subband domain. Mutual information allows to account for higher order statistics and hence to consider dependencies beyond the conventional second order statistics. Using data from three different listening tests it is shown that the proposed objective intelligibility measures provide promising results for speech intelligibility prediction in different scenarios of speech enhancement where speech is processed by non-linear modification strategies.Index Terms-Mutual information, objective measures, speech intelligibility prediction.
“…Other metrics that have recently been used to optimize the intelligibility of speech in noise are [13,[21][22][23] Another approach to quantifying speech intelligibility is to use information theory to describe the amount of information that can be transmitted through a speech communication channel. Examples of speech intelligibility predictors based on mutual information (MI) can be found in [24][25][26][27]. In [27] an effective model of human communication based on MI was derived.…”
Section: Examples Of Classical Measures That Have Been Developed Tomentioning
The processing required for the global maximization of the intelligibility of speech acquired by multiple microphones and rendered by a single loudspeaker, is considered in this paper. The intelligibility is quantized, based on the mutual information rate between the message spoken by the talker and the message as interpreted by the listener. We prove that then, in each of a set of narrow-band channels, the processing can be decomposed into a minimum variance distortionless response (MVDR) beamforming operation that reduces the noise in the talker environment, followed by a gain operation that, given the far-end noise and beamforming operation, accounts for the noise at the listener end. Our experiments confirm that both processing steps are necessary for the effective conveyance of a message and, importantly, that the second step must be aware of the first step.Index Terms-Speech intelligibility enhancement, mutual information, minimum variance distortionless response (MVDR) beamformer, multi-microphone.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.