Jeih-weih Hung scite author profile

Jeih-weih Hung

5Publications

70Citation Statements Received

74Citation Statements Given

How they've been cited

241

How they cite others

142

Affiliations

National Chi Nan University, National Taiwan University, Institute of Information Science, Academia Sinica

Publications

Order By: Most citations

Time-Domain Multi-Modal Bone/Air Conducted Speech Enhancement

Hung

Wang

et al. 2020

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

Integrating modalities, such as video signals with speech, has been shown to provide a standard quality and intelligibility for speech enhancement (SE). However, video clips usually contain large amounts of data and pose a high cost in terms of computational resources, which may complicate the respective SE. By contrast, a bone-conducted speech signal has a moderate data size while it manifests speech-phoneme structures, and thus complements its air-conducted counterpart, benefiting the enhancement. In this study, we propose a novel multi-modal SE structure that leverages bone-and air-conducted signals. In addition, we examine two strategies, early fusion and late fusion (LF), to process the two types of speech signals, and adopt a deep learning-based fully convolutional network to conduct the enhancement. The experiment results indicate that this newly presented multimodal structure significantly outperforms the single-source SE counterparts (with a bone-or air-conducted signal only) in various speech evaluation metrics. In addition, the adoption of an LF strategy other than an EF in this novel SE multi-modal structure achieves better results.

show abstract

Wavelet Speech Enhancement Based on Nonnegative Matrix Factorization

Wang

Chern

Tsao

et al. 2016

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

For most of the state-of-the-art speech enhancement techniques, a spectrogram is usually preferred than the respective time-domain raw data since it reveals more compact presentation together with conspicuous temporal information over a long time span. However, the short-time Fourier transform (STFT) that creates the spectrogram in general distorts the original signal and thereby limits the capability of the associated speech enhancement techniques. In this study, we propose a novel speech enhancement method that adopts the algorithms of discrete wavelet packet transform (DWPT) and nonnegative matrix factorization (NMF) in order to conquer the aforementioned limitation. In brief, the DWPT is first applied to split a timedomain speech signal into a series of subband signals without introducing any distortion. Then we exploit NMF to highlight the speech component for each subband. Finally, the enhanced subband signals are joined together via the inverse DWPT to reconstruct a noisereduced signal in time domain. We evaluate the proposed DWPT-NMF based speech enhancement method on the MHINT task. Experimental results show that this new method behaves very well in prompting speech quality and intelligibility and it outperforms the convnenitional STFT-NMF based method.

show abstract

Speech enhancement using segmental nonnegative matrix factorization

Fan

Hung

et al. 2014

View full text Add to dashboard Cite

Robust entropy-based endpoint detection for speech recognition in noisy environments

Shen¹,

Hung²,

Lee³

1998

114

View full text Add to dashboard Cite

Speaker-Aware Deep Denoising Autoencoder with Embedded Speaker Identity for Speech Enhancement

Chuang¹,

Wang²,

Hung³

et al. 2019

View full text Add to dashboard Cite

Previous studies indicate that noise and speaker variations can degrade the performance of deep-learning-based speechenhancement systems. To increase the system performance over environmental variations, we propose a novel speaker-aware system that integrates a deep denoising autoencoder (DDAE) with an embedded speaker identity. The overall system first extracts embedded speaker identity features using a neural network model; then the DDAE takes the augmented features as input to generate enhanced spectra. With the additional embedded features, the speech-enhancement system can be guided to generate the optimal output corresponding to the speaker identity. We tested the proposed speech-enhancement system on the TIMIT dataset. Experimental results showed that the proposed speech-enhancement system could improve the sound quality and intelligibility of speech signals from additive noisecorrupted utterances. In addition, the results suggested system robustness for unseen speakers when combined with speaker features.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jeih-weih Hung

Time-Domain Multi-Modal Bone/Air Conducted Speech Enhancement

Wavelet Speech Enhancement Based on Nonnegative Matrix Factorization

Speech enhancement using segmental nonnegative matrix factorization

Robust entropy-based endpoint detection for speech recognition in noisy environments

Speaker-Aware Deep Denoising Autoencoder with Embedded Speaker Identity for Speech Enhancement

Contact Info

Product

Resources

About