This paper compares two methods for extracting room acoustic parameters from reverberated speech and music. An approach which uses statistical machine learning, previously developed for speech, is extended to work with music. For speech, reverberation time estimations are within a perceptual difference limen of the true value. For music, virtually all early decay time estimations are within a difference limen of the true value. The estimation accuracy is not good enough in other cases due to differences between the simulated data set used to develop the empirical model and real rooms. The second method carries out a maximum likelihood estimation on decay phases at the end of notes or speech utterances. This paper extends the method to estimate parameters relating to the balance of early and late energies in the impulse response. For reverberation time and speech, the method provides estimations which are within the perceptual difference limen of the true value. For other parameters such as clarity, the estimations are not sufficiently accurate due to the natural reverberance of the excitation signals. Speech is a better test signal than music because of the greater periods of silence in the signal, although music is needed for low frequency measurement.
Technology used in Digital TV has the potential to enhance the viewing experience for millions of hard of hearing people. The Clean Audio project commissioned by the Independent Television Commission (ITC), and continued by Ofcom, looks at methods by which the extra information contained in 5.1 surround sound broadcasts may be used to improve the intelligibility and enjoyment of television audio for hard of hearing viewers and shows that audio processing can effectively turn a digital TV set top box into an assistive device to make digital TV more accessible. Listening tests were carried out which showed benefits in clarity and in perceived overall sound quality for hard of hearing participants by altering levels of centre and left and right channels. Further testing has shown average improvements in intelligibility of up to 9.4% by using surround sound equipment with a discrete central loudspeaker compared to stereophonic reproduction.
For field recordings and user generated content recorded on phones, tablets, and other mobile devices nonlinear distortions caused by clipping and limiting at pre-amplification stages, and dynamic range control (DRC) are common causes of poor audio quality. A single-ended method to detect these distortions and predict perceived degradation in speech, music, and soundscapes has been developed. This was done by training an ensemble of decision trees. During training, both clean and distorted audio was available and so the perceived quality could be gauged using HASQI (Hearing Aid Sound Quality Index). The new single-ended method can correctly predict HASQI from distorted samples to an accuracy of ±0.19 (95% confidence interval) using a quality range between 0.0 and 1.0. The method also has potential for estimating HASQI when other types of degradations are present. Subsequent perceptual tests validated the method for music and soundscapes. For the average mean opinion score for perceived audio quality on a scale from 0 to 1, the single ended method could estimate it within ±0.33.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.