We explore the quality impact when audiovisual content is delivered to different mobile devices. Subjects were shown the same sequences on five different mobile devices and a broadcast quality television. Factors influencing quality ratings include video resolution, viewing distance, and monitor size. Analysis shows how subjects' perception of multimedia quality differs when content is viewed on different mobile devices. In addition, quality ratings from laboratory and simulated living room sessions were statistically equivalent.
<p>Wideband Audio Waveform Evaluation Networks (WAWEnets) are convolutional neural networks that operate directly on wideband audio waveforms in order to produce evaluations of those waveforms. In the present work these evaluations give qualities of telecommunications speech (e.g., noisiness, intelligibility, overall speech quality). WAWEnets are no-reference networks because they do not require “reference” (original or undistorted) versions of the waveforms they evaluate. Our initial WAWEnet publication introduced four WAWEnets and each emulated the output of an established full-reference speech quality or intelligibility estimation algorithm.</p> <p>We have updated the WAWEnet architecture to be more efficient and effective. Here we present a single WAWEnet that closely tracks seven different quality and intelligibility values. We create a second network that additionally tracks four subjective speech quality dimensions. We offer a third network that focuses on just subjective quality scores and achieves very high levels of agreement. This work has leveraged 334 hours of speech in 13 languages, over two million full-reference target values and over 93,000 subjective mean opinion scores.</p> <p>We also interpret the operation of WAWEnets and identify the key to their operation using the language of signal processing: ReLUs strategically move spectral information from non-DC components into the DC component. The DC values of 96 output signals define a vector in a 96-D latent space and this vector is then mapped to a quality or intelligibility value for the input waveform.</p>
The value or harm associated with an increase in speech coding quality depends on the type of the increase as well as the temporal location of the increase in an utterance. For example, some increases in speech coding bandwidth can be perceived as impairments. The higher quality associated with the wider bandwidth can offset the impairment, but only if the increase happens early enough in an utterance. We present a subjective speech-quality experiment that qualifies these relationships at the talk-spurt time-scale for six different combinations of AMR and SILK speech coders. If a quality increase does not include a bandwidth increase, then, on average, it is beneficial only if it occurs in the first 2.8 seconds of a talk-spurt. If a quality increase includes a bandwidth increase, then it is beneficial only if it occurs in the first 1.8 seconds of a talk-spurt.Index Terms-AMR, SILK, speech bandwidth, speech coding, speech quality, subjective testing, time-varying speech quality BACKGROUND AND MOTIVATIONAvailable resources on modern voice networks vary with time. This, along with the mobility of many voice network users, results in dynamic resource availability for any given call. Service providers strive to provide a graceful degradation of speech quality when network resources become scarce during a call. When additional network resources become available during a call, it may be possible to increase the speech coding rate and deliver higher speech quality.But the effect of the quality transition must be considered. For example, wideband (WB) speech (50 to 7000 Hz nominal passband) has a documented higher perceived quality than narrowband (NB) speech (300 to 3400 Hz nominal passband) [1]-[3], but a transition from NB to WB speech coding is perceived as an impairment [4]- [6]. If the transition happens early enough in a speech recording, the value of the WB portion can exceed the harm of the transition, for a net improvement (relative to NB only) in overall speech quality. This was the case for NB-to-WB transitions at the 15 or 30 second point in a 60 second recording [4], [5]. But if the transition happens later in a speech recording, the shorter duration of the WB portion means that its value does not overcome the harm of the transition. This was the case for NB-to-WB transitions at the 45 second point in a 60 second recording [4], [5] or at the three-second point of a six-second recording [6]. In [6] we also experimented with gradual transitions (up to 2.5 seconds long) but found they did not mitigate the harm of the transition.Even quality transitions within a fixed bandwidth can be perceived as impairments. In [7], [8] short NB recordings with distinct quality levels were concatenated to form longer recordings and subjective scores were provided for both the short and long recordings.Analysis of these scores shows that when average quality is held constant, increases in quality variation lead to reductions in longterm speech quality.In [9] subjects evaluated three-second NB speech recordings with a low-high-l...
Subjective testing is the most direct means of assessing audio, video, and multimedia quality as experienced by users and maximizing the information gathered while minimizing the number of trials is an important goal. We propose gradient ascent subjective testing (GAST) as an efficient way to locate optimizing sets of coding or transmission parameter values. GAST combines gradient ascent optimization techniques with paired-comparison subjective test trials to efficiently locate parameter values that maximize perceived quality. We used GAST to search a two-dimensional parameter space for the known region of maximal audio quality as proof-of-concept. That point was accurately located and we estimate that conventional testing would have required at least 27 times as many trials to generate the same results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.