Multi-channel Acoustic Modeling using Mixed Bitrate OPUS Compression

Khare, Aparna; Sundaram, Shiva; Wu, Minhua

doi:10.48550/arxiv.2002.00122

Cited by 1 publication

(3 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While Opus has been found to be effective for single-and multi-channel ASR applications [11], it has been optimized for subjective quality, not for ASR accuracy. Moreover, its channel coupling has been optimized for human spatial perception, where the inter-aural phase difference (IPD) [20] is only noticeable at lower frequencies.…”

Section: Opus Compression and Proposed Optimizationmentioning

confidence: 99%

“…We here focus on Opus compression [9] due to its widespread use in voice over IP (VoIP) and ASR applications [10]. While Opus has been found to be effective for singleand multi-channel ASR applications [11], it is in some sense mismatched to the task at hand. First, like all lossy speech and audio compression methods, Opus has been optimized for subjective quality, not for ASR or subsequent spatial filtering.…”

Section: Introductionmentioning

confidence: 99%

“…With focus on ASR applications, [14] analyzes the impact of different compression algorithms on a single-channel acoustic model (AM) and concludes that using a compression algorithms as a data augmentation step in the training phase of an AM can lead to a more robust model. Closely related to the document at hand, Khare et al analyzed the impact of Opus compression on multi-channel AMs [11] with the main findings that multi-channel data is preferable over single-channel transmission with a higher bitrate. They further report gains when retraining the AM, which we do not repeat here.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Multi-Channel Opus Compression for Far-Field Automatic Speech Recognition with a Fixed Bitrate Budget

Drude¹,

Heymann²,

Schwarz³

et al. 2021

Interspeech 2021

View full text Add to dashboard Cite

Automatic speech recognition (ASR) in the cloud allows the use of larger models and more powerful multi-channel signal processing front-ends compared to on-device processing. However, it also adds an inherent latency due to the transmission of the audio signal, especially when transmitting multiple channels of a microphone array. One way to reduce the network bandwidth requirements is client-side compression with a lossy codec such as Opus. However, this compression can have a detrimental effect especially on multi-channel ASR front-ends, due to the distortion and loss of spatial information introduced by the codec. In this publication, we propose an improved approach for the compression of microphone array signals based on Opus, using a modified joint channel coding approach and additionally introducing a multi-channel spatial decorrelating transform to reduce redundancy in the transmission. We illustrate the effect of the proposed approach on the spatial information retained in multi-channel signals after compression, and evaluate the performance on far-field ASR with a multi-channel beamforming front-end. We demonstrate that our approach can lead to a 37.5 % bitrate reduction or a 5.1 % relative word error rate (WER) reduction for a fixed bitrate budget in a seven channel setup.

show abstract

Section: Opus Compression and Proposed Optimizationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%