Seungkwon Beack scite author profile

Seungkwon Beack

5Publications

69Citation Statements Received

110Citation Statements Given

How they've been cited

How they cite others

110

Affiliations

Electronics and Telecommunications Research Institute, Daejeon Institute of Science and Technology

Publications

Order By: Most citations

Cascaded Cross-Module Residual Learning Towards Lightweight End-to-End Speech Coding

Zhen¹,

Sung²,

Lee³

et al. 2019

View full text Add to dashboard Cite

Speech codecs learn compact representations of speech signals to facilitate data transmission. Many recent deep neural network (DNN) based end-to-end speech codecs achieve low bitrates and high perceptual quality at the cost of model complexity. We propose a cross-module residual learning (CMRL) pipeline as a module carrier with each module reconstructing the residual from its preceding modules. CMRL differs from other DNN-based speech codecs, in that rather than modeling speech compression problem in a single large neural network, it optimizes a series of less-complicated modules in a two-phase training scheme. The proposed method shows better objective performance than AMR-WB and the state-of-the-art DNNbased speech codec with a similar network architecture. As an end-to-end model, it takes raw PCM signals as an input, but is also compatible with linear predictive coding (LPC), showing better subjective quality at high bitrates than AMR-WB and OPUS. The gain is achieved by using only 0.9 million trainable parameters, a significantly less complex architecture than the other DNN-based codecs in the literature. Index Terms: speech coding, deep neural network, entropy coding, residual learning Model descriptionBefore introducing CMRL as a module carrier, we describe the component module to be hosted by CMRL. The component moduleRecently, an end-to-end DNN speech codec (referred to as Kankanahalli-Net) has shown competitive performance comparable to one of the standards (AMR-WB) [14]. We describe our component model derived from Kankanahalli-Net that consists of bottleneck residual learning [24], soft-to-hard quantization [25], and sub-pixel convolutional neural networks for upsampling [26]. Figure 1 depicts the component module.

show abstract

Spatial Audio Object Coding With Two-Step Coding Structure for Interactive Audio Service

Kim

Seo

Beack

et al. 2011

IEEE Trans. Multimedia

View full text Add to dashboard Cite

Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding

Zhen

Lee

Sung

et al. 2020

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

Conventional audio coding technologies commonly leverage human perception of sound, or psychoacoustics, to reduce the bitrate while preserving the perceptual quality of the decoded audio signals. For neural audio codecs, however, the objective nature of the loss function usually leads to suboptimal sound quality as well as high run-time complexity due to the large model size. In this work, we present a psychoacoustic calibration scheme to re-define the loss functions of neural audio coding systems so that it can decode signals more perceptually similar to the reference, yet with a much lower model complexity. The proposed loss function incorporates the global masking threshold, allowing the reconstruction error that corresponds to inaudible artifacts. Experimental results show that the proposed model outperforms the baseline neural codec twice as large and consuming 23.4% more bits per second. With the proposed method, a lightweight neural codec, with only 0.9 million parameters, performs near-transparent audio coding comparable with the commercial MPEG-1 Audio Layer III codec at 112 kbps.

show abstract

Efficient and Scalable Neural Residual Waveform Coding with Collaborative Quantization

Zhen

Lee

Sung

et al. 2020

View full text Add to dashboard Cite

Scalability and efficiency are desired in neural speech codecs, which supports a wide range of bitrates for applications on various devices. We propose a collaborative quantization (CQ) scheme to jointly learn the codebook of LPC coefficients and the corresponding residuals. CQ does not simply shoehorn LPC to a neural network, but bridges the computational capacity of advanced neural network models and traditional, yet efficient and domain-specific digital signal processing methods in an integrated manner. We demonstrate that CQ achieves much higher quality than its predecessor at 9 kbps with even lower model complexity. We also show that CQ can scale up to 24 kbps where it outperforms AMR-WB and Opus. As a neural waveform codec, CQ models are with less than 1 million parameters, significantly less than many other generative models.

show abstract

Angle-Based Virtual Source Location Representation for Spatial Audio Coding

Beack¹,

Seo²,

Moon³

et al. 2006

ETRI J

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Seungkwon Beack

Cascaded Cross-Module Residual Learning Towards Lightweight End-to-End Speech Coding

Spatial Audio Object Coding With Two-Step Coding Structure for Interactive Audio Service

Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding

Efficient and Scalable Neural Residual Waveform Coding with Collaborative Quantization

Angle-Based Virtual Source Location Representation for Spatial Audio Coding

Contact Info

Product

Resources

About