Kuo-Hsuan Hung scite author profile

The electrocardiogram (ECG) is an efficient and noninvasive indicator for arrhythmia detection and prevention. In real-world scenarios, ECG signals are prone to be contaminated with various noises, which may lead to wrong interpretation. Therefore, significant attention has been paid on denoising of ECG for accurate diagnosis and analysis. A denoising autoencoder (DAE) can be applied to reconstruct the clean data from its noisy version. In this paper, a DAE using the fully convolutional network (FCN) is proposed for ECG signal denoising. Meanwhile, the proposed FCN-based DAE can perform compression with regard to the DAE architecture. The proposed approach is applied to ECG signals from the MIT-BIH Arrhythmia database and the added noise signals are obtained from the MIT-BIH Noise Stress Test database. The denoising performance is evaluated using the root-mean-square error (RMSE), percentage-root-mean-square difference (PRD), and improvement in signal-to-noise ratio (SNR imp ). The results of the experiments conducted on noisy ECG signals of different levels of input SNR show that the FCN acquires better performance as compared to the deep fully connected neural network-and convolutional neural network-based denoising models. Moreover, the proposed FCN-based DAE reduces the size of the input ECG signals, where the compressed data is 32 times smaller than the original. The results of the study demonstrate the superiority of FCN in denoising, with lower RMSE and PRD, as well as higher SNR imp . According to the results, we believe that the proposed FCN-based DAE has a good application prospect in clinical practice.INDEX TERMS Electrocardiography, signal denoising, artificial neural networks, denoising autoencoders, fully convolutional network.

show abstract

Time-Domain Multi-Modal Bone/Air Conducted Speech Enhancement

Hung

Wang

et al. 2020

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

Integrating modalities, such as video signals with speech, has been shown to provide a standard quality and intelligibility for speech enhancement (SE). However, video clips usually contain large amounts of data and pose a high cost in terms of computational resources, which may complicate the respective SE. By contrast, a bone-conducted speech signal has a moderate data size while it manifests speech-phoneme structures, and thus complements its air-conducted counterpart, benefiting the enhancement. In this study, we propose a novel multi-modal SE structure that leverages bone-and air-conducted signals. In addition, we examine two strategies, early fusion and late fusion (LF), to process the two types of speech signals, and adopt a deep learning-based fully convolutional network to conduct the enhancement. The experiment results indicate that this newly presented multimodal structure significantly outperforms the single-source SE counterparts (with a bone-or air-conducted signal only) in various speech evaluation metrics. In addition, the adoption of an LF strategy other than an EF in this novel SE multi-modal structure achieves better results.

show abstract

Waveform-based Voice Activity Detection Exploiting Fully Convolutional networks with Multi-Branched Encoders

Yu¹,

Hung²,

Lin³

et al. 2020

Preprint

View full text Add to dashboard Cite

MetricGAN-U: Unsupervised Speech Enhancement/ Dereverberation Based Only on Noisy/ Reverberated Speech

Hung

et al. 2022

View full text Add to dashboard Cite

Speech quality estimation has recently undergone a paradigm shift from humanhearing expert designs to machine-learning models. However, current models rely mainly on supervised learning, which is time-consuming and expensive for label collection. To solve this problem, we propose VQScore, a self-supervised metric for evaluating speech based on the quantization error of a vector-quantizedvariational autoencoder (VQ-VAE). The training of VQ-VAE relies on clean speech; hence, large quantization errors can be expected when the speech is distorted. To further improve correlation with real quality scores, domain knowledge of speech processing is incorporated into the model design. We found that the vector quantization mechanism could also be used for self-supervised speech enhancement (SE) model training. To improve the robustness of the encoder for SE, a novel self-distillation mechanism combined with adversarial training is introduced. In summary, the proposed speech quality estimation method and enhancement models require only clean speech for training without any label requirements. Experimental results show that the proposed VQScore and enhancement model are competitive with supervised baselines. The code will be released after publication.

show abstract

Deep-Learning-Based Signal Enhancement of Low-Resolution Accelerometer for Fall Detection Systems

Liu

Hung

Hsieh

et al. 2022

IEEE Trans. Cogn. Dev. Syst.

View full text Add to dashboard Cite

In the last two decades, fall detection (FD) systems have been developed as a popular assistive technology. To support long-term FD services, various power-saving strategies have been implemented. Among them, a reduced sampling rate is a common approach for an energy-efficient system in the real world. However, the performance of FD systems is diminished owing to low-resolution (LR) accelerometer signals. To improve the detection accuracy with LR accelerometer signals, several technical challenges must be considered, including mismatch of effective features and the degradation effects. In this work, a deeplearning-based accelerometer signal enhancement (ASE) model is proposed as a front-end processor to help typical LR-FD systems achieve better detection performance. The proposed ASE model based on a deep denoising convolutional autoencoder architecture reconstructs high-resolution (HR) signals from the LR signals by learning the relationship between the LR and HR signals. The results show that the FD system using support vector machine and the proposed ASE model at an extremely low sampling rate (sampling rate < 2 Hz) achieved 97.34% and 90.52% accuracies in the SisFall and FallAllD datasets, respectively, while those without ASE models only achieved 95.92% and 87.47% accuracies in the SisFall and FallAllD datasets, respectively. The results also demonstrate that the proposed ASE mode can be suitably combined with deep-learning-based FD systems.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kuo-Hsuan Hung

Noise Reduction in ECG Signals Using Fully Convolutional Denoising Autoencoders

Time-Domain Multi-Modal Bone/Air Conducted Speech Enhancement

Waveform-based Voice Activity Detection Exploiting Fully Convolutional networks with Multi-Branched Encoders

MetricGAN-U: Unsupervised Speech Enhancement/ Dereverberation Based Only on Noisy/ Reverberated Speech

Deep-Learning-Based Signal Enhancement of Low-Resolution Accelerometer for Fall Detection Systems

Contact Info

Product

Resources

About