Xiangdong Su scite author profile

This paper proposes a full-band and sub-band fusion model, named as FullSubNet, for single-channel real-time speech enhancement. Full-band and sub-band refer to the models that input full-band and sub-band noisy spectral feature, output full-band and sub-band speech target, respectively. The sub-band model processes each frequency independently. Its input consists of one frequency and several context frequencies. The output is the prediction of the clean speech target for the corresponding frequency. These two types of models have distinct characteristics. The full-band model can capture the global spectral context and the long-distance crossband dependencies. However, it lacks the ability to modeling signal stationarity and attending the local spectral pattern. The sub-band model is just the opposite. In our proposed FullSubNet, we connect a pure full-band model and a pure sub-band model sequentially and use practical joint training to integrate these two types of models' advantages. We conducted experiments on the DNS challenge (INTERSPEECH 2020) dataset to evaluate the proposed method. Experimental results show that full-band and sub-band information are complementary, and the FullSubNet can effectively integrate them. Besides, the performance of the FullSubNet also exceeds that of the top-ranked methods in the DNS Challenge (INTERSPEECH 2020).

show abstract

Masking and Inpainting: A Two-Stage Speech Enhancement Approach for Low SNR and Non-Stationary Noise

Hao

Wen³

et al. 2020

View full text Add to dashboard Cite

Script-Level Word Sample Augmentation for Few-Shot Handwritten Text Recognition

Chen

Zhang

2022

View full text Add to dashboard Cite

UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-Noise Ratio Condition

Hao

Su²,

Wang

et al. 2019

View full text Add to dashboard Cite

Speech enhancement at extremely low signal-to-noise ratio (SNR) condition is a very challenging problem and rarely investigated in previous works. This paper proposes a robust speech enhancement approach (UNetGAN) based on U-Net and generative adversarial learning to deal with this problem. This approach consists of a generator network and a discriminator network, which operate directly in the time domain. The generator network adopts a U-Net like structure and employs dilated convolution in the bottleneck of it. We evaluate the performance of the UNetGAN at low SNR conditions (up to-20dB) on the public benchmark. The result demonstrates that it significantly improves the speech quality and substantially outperforms the representative deep learning models, including SEGAN, cGAN fo SE, Bidirectional LSTM using phase-sensitive spectrum approximation cost function (PSA-BLSTM) and Wave-U-Net regarding Short-Time Objective Intelligibility (STOI) and Perceptual evaluation of speech quality (PESQ).

show abstract

A benchmark dataset and case study for Chinese medical question intent classification

Chen

Liu

et al. 2020

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

Background: To provide satisfying answers, medical QA system has to understand the intentions of the users' questions precisely. For medical intent classification, it requires high-quality datasets to train a deep-learning approach in a supervised way. Currently, there is no public dataset for Chinese medical intent classification, and the datasets of other fields are not applicable to the medical QA system. To solve this problem, we construct a Chinese medical intent dataset (CMID) using the questions from medical QA websites. On this basis, we compare four intent classification models on CMID using a case study. Methods: The questions in CMID are obtained from several medical QA websites. The intent annotation standard is developed by the medical experts, which includes four types and 36 subtypes of users' intents. Besides the intent label, CMID also provides two types of additional information, including word segmentation and named entity. We use the crowdsourcing way to annotate the intent information for each Chinese medical question. Word segmentation and named entities are obtained using the Jieba and a well-trained Lattice-LSTM model. We loaded a Chinese medical dictionary consisting of 530,000 for word segmentation to obtain a more accurate result. We also select four popular deep learning-based models and compare their performances of intent classification on CMID. Results: The final CMID contains 12,000 Chinese medical questions and is organized in JSON format. Each question is labeled the intention, word segmentation, and named entity information. The information about question length, number of entities, and are also detailed analyzed. Among Fast Text, TextCNN, TextRNN, and TextGCN, Fast Text and TextCNN models have achieved the best results in four types and 36 subtypes intent classification, respectively. Conclusions: In this work, we provide a dataset for Chinese medical intent classification, which can be used in medical QA and related fields. We performed an intent classification task on the CMID. In addition, we also did some analysis on the content of the dataset.

show abstract

Design and Integration of the Single-Lens Curved Multi-Focusing Compound Eye Camera

et al. 2021

View full text Add to dashboard Cite

Compared with a traditional optical system, the single-lens curved compound eye imaging system has superior optical performance, such as a large field of view (FOV), small size, and high portability. However, defocus and low resolution hinder the further development of single-lens curved compound eye imaging systems. In this study, the design of a nonuniform curved compound eye with multiple focal lengths was used to solve the defocus problem. A two-step gas-assisted process, which was combined with photolithography, soft photolithography, and ultraviolet curing, was proposed for fabricating the ommatidia with a large numerical aperture precisely. Ommatidia with high resolution were fabricated and arranged in five rings. Based on the imaging experimental results, it was demonstrated that the high-resolution and small-volume single-lens curved compound eye imaging system has significant advantages in large-field imaging and rapid recognition.

show abstract

Snr-Based Teachers-Student Technique For Speech Enhancement

Hao

Wang

et al. 2020

View full text Add to dashboard Cite

Integrating Topic Information into VAE for Text Semantic Similarity

Yan

Gong

et al. 2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiangdong Su

Fullsubnet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement

Masking and Inpainting: A Two-Stage Speech Enhancement Approach for Low SNR and Non-Stationary Noise

Script-Level Word Sample Augmentation for Few-Shot Handwritten Text Recognition

UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-Noise Ratio Condition

A benchmark dataset and case study for Chinese medical question intent classification

Design and Integration of the Single-Lens Curved Multi-Focusing Compound Eye Camera

Snr-Based Teachers-Student Technique For Speech Enhancement

Integrating Topic Information into VAE for Text Semantic Similarity

Contact Info

Product

Resources

About