Group Communication With Context Codec for Lightweight Source Separation

Luo, Yi; Han, Cong; Mesgarani, Nima

doi:10.1109/taslp.2021.3078640

Cited by 26 publications

(18 citation statements)

References 57 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It reduces the model size and complexity by weight sharing across all groups (group communication), and further decrease the number of multiplyaccumulate operation using encoder-decoder-based temporal compression method (context codec). In the encoder part of the context codec, the temporal context of local feature is summarized into a single feature representing the global characteristics of the context [37]. After passing the group communication-equipped separation module, the compressed feature is transformed back to the context feature through the decoder part of the context codec and reconstructed to the estimated waveforms through a decoding transformation.…”

Section: Noise Suppression Modelmentioning

confidence: 99%

“…Considering the model size of the joint model consisted of NS and SED, we used the GC3-TCN for the NS model. More details are described in [37].…”

Section: Noise Suppression Modelmentioning

confidence: 99%

“…The configuration of the NS model, i.e., GC3-TCN, was set using the hyperparameters and notations (described in Table 1) in [29,37] as follows:…”

Section: Noise Suppression Modelmentioning

confidence: 99%

“…Motivated by the joint DNN-based audio enhancement actively conducted in the ASR field, this paper proposes for the first time in the field of SED combining DNN-based time-domain noise suppression (NS) at the front end to increase the SED performance in a low signal-to-noise ratio (SNR) environment. For the NS model at the front end, we use a temporal convolutional network with the group communication with context codec method (GC3-TCN) [37], which reduce the model complexity of Conv-TasNet and secures the same performance. In [37], time-domain GC3-TCN was originally used for audio and speech separation; however, in this study, it is modified for NS.…”

Section: Introductionmentioning

confidence: 99%

“…For the NS model at the front end, we use a temporal convolutional network with the group communication with context codec method (GC3-TCN) [37], which reduce the model complexity of Conv-TasNet and secures the same performance. In [37], time-domain GC3-TCN was originally used for audio and speech separation; however, in this study, it is modified for NS. For the SED model at the back end, a CRNN-based classification model is employed.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification

Son

Chang

2021

Sensors

View full text Add to dashboard Cite

Sound event detection (SED) recognizes the corresponding sound event of an incoming signal and estimates its temporal boundary. Although SED has been recently developed and used in various fields, achieving noise-robust SED in a real environment is typically challenging owing to the performance degradation due to ambient noise. In this paper, we propose combining a pretrained time-domain speech-separation-based noise suppression network (NS) and a pretrained classification network to improve the SED performance in real noisy environments. We use group communication with a context codec method (GC3)-equipped temporal convolutional network (TCN) for the noise suppression model and a convolutional recurrent neural network for the SED model. The former significantly reduce the model complexity while maintaining the same TCN module and performance as a fully convolutional time-domain audio separation network (Conv-TasNet). We also do not update the weights of some layers (i.e., freeze) in the joint fine-tuning process and add an attention module in the SED model to further improve the performance and prevent overfitting. We evaluate our proposed method using both simulation and real recorded datasets. The experimental results show that our method improves the classification performance in a noisy environment under various signal-to-noise-ratio conditions.

show abstract

Section: Noise Suppression Modelmentioning

confidence: 99%

“…Considering the model size of the joint model consisted of NS and SED, we used the GC3-TCN for the NS model. More details are described in [37].…”

Section: Noise Suppression Modelmentioning

confidence: 99%

“…The configuration of the NS model, i.e., GC3-TCN, was set using the hyperparameters and notations (described in Table 1) in [29,37] as follows:…”

Section: Noise Suppression Modelmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification

Son

Chang

2021

Sensors

View full text Add to dashboard Cite

show abstract

An WiFi-Based Human Activity Recognition System Under Multi-source Interference

Jiang

et al. 2022

Lecture Notes in Electrical Engineering

View full text Add to dashboard Cite

Using Classification Algorithms to Predict Taiwan Stock Market - A Case Study of Taiwan Index Futures

Wang¹,

Lee²

2022

Communications in Computer and Information Science

View full text Add to dashboard Cite

This paper proposes to use both audio input and subject information to predict the personalized preference of two audio segments with the same content in different qualities. A siamese network is used to compare the inputs and predict the preference. Several different structures for each side of the siamese network are investigated, and an LDNet with PANNs' CNN6 as the encoder and a multi-layer perceptron block as the decoder outperforms a baseline model using only audio input the most, where the overall accuracy grows from 77.56% to 78.04%. Experimental results also show that using all the subject information, including age, gender, and the specifications of headphones or earphones, is more effective than using only a part of them.

show abstract

Group Communication With Context Codec for Lightweight Source Separation

Cited by 26 publications

References 57 publications

Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification

Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification

An WiFi-Based Human Activity Recognition System Under Multi-source Interference

Using Classification Algorithms to Predict Taiwan Stock Market - A Case Study of Taiwan Index Futures

Contact Info

Product

Resources

About