Hong Kook Kim scite author profile

Air pollution not only damages the environment but also leads to various illnesses such as respiratory tract and cardiovascular diseases. Nowadays, estimating air pollutants concentration is becoming very important so that people can prepare themselves for the hazardous impact of air pollution beforehand. Various deterministic models have been used to forecast air pollution. In this study, along with various pollutants and meteorological parameters, we also use the concentration of the pollutants predicted by the community multiscale air quality (CMAQ) model which are strongly related to PM 2.5 concentration. After combining these parameters, we implement various machine learning models to predict the hourly forecast of PM 2.5 concentration in two big cities of South Korea and compare their results. It has been shown that Long Short Term Memory network outperforms other well-known gradient tree boosting models, recurrent, and convolutional neural networks.

show abstract

Personalized HRTF Modeling Based on Deep Neural Network Using Anthropometric Measurements and Images of the Ear

Lee

Kim

2018

Applied Sciences

View full text Add to dashboard Cite

This paper proposes a personalized head-related transfer function (HRTF) estimation method based on deep neural networks by using anthropometric measurements and ear images. The proposed method consists of three sub-networks for representing personalized features and estimating the HRTF. As input features for neural networks, the anthropometric measurements regarding the head and torso are used for a feedforward deep neural network (DNN), and the ear images are used for a convolutional neural network (CNN). After that, the outputs of these two sub-networks are merged into another DNN for estimation of the personalized HRTF. To evaluate the performance of the proposed method, objective and subjective evaluations are conducted. For the objective evaluation, the root mean square error (RMSE) and the log spectral distance (LSD) between the reference HRTF and the estimated one are measured. Consequently, the proposed method provides the RMSE of −18.40 dB and LSD of 4.47 dB, which are lower by 0.02 dB and higher by 0.85 dB than the DNN-based method using anthropometric data without pinna measurements, respectively. Next, a sound localization test is performed for the subjective evaluation. As a result, it is shown that the proposed method can localize sound sources with higher accuracy of around 11% and 6% than the average HRTF method and DNN-based method, respectively. In addition, the reductions of the front/back confusion rate by 12.5% and 2.5% are achieved by the proposed method, compared to the average HRTF method and DNN-based method, respectively.

show abstract

Deep-Learning-Based Detection of Infants with Autism Spectrum Disorder Using Auto-Encoder Feature Representation

Lee

Bong

et al. 2020

Sensors

View full text Add to dashboard Cite

Autism spectrum disorder (ASD) is a developmental disorder with a life-span disability. While diagnostic instruments have been developed and qualified based on the accuracy of the discrimination of children with ASD from typical development (TD) children, the stability of such procedures can be disrupted by limitations pertaining to time expenses and the subjectivity of clinicians. Consequently, automated diagnostic methods have been developed for acquiring objective measures of autism, and in various fields of research, vocal characteristics have not only been reported as distinctive characteristics by clinicians, but have also shown promising performance in several studies utilizing deep learning models based on the automated discrimination of children with ASD from children with TD. However, difficulties still exist in terms of the characteristics of the data, the complexity of the analysis, and the lack of arranged data caused by the low accessibility for diagnosis and the need to secure anonymity. In order to address these issues, we introduce a pre-trained feature extraction auto-encoder model and a joint optimization scheme, which can achieve robustness for widely distributed and unrefined data using a deep-learning-based method for the detection of autism that utilizes various models. By adopting this auto-encoder-based feature extraction and joint optimization in the extended version of the Geneva minimalistic acoustic parameter set (eGeMAPS) speech feature data set, we acquire improved performance in the detection of ASD in infants compared to the raw data set.

show abstract

Speech recognition using quantized LSP parameters and their transformations in digital communication

Choi¹,

Kim²,

Lee³

2000

Speech Communication

View full text Add to dashboard Cite

Dysarthric Speech Recognition Error Correction Using Weighted Finite State Transducers Based on Context–Dependent Pronunciation Variation

Seong

Park

Kim

2012

View full text Add to dashboard Cite

Multi-Task Learning U-Net for Single-Channel Speech Enhancement and Mask-Based Voice Activity Detection

Lee

Kim

2020

Applied Sciences

View full text Add to dashboard Cite

In this paper, a multi-task learning U-shaped neural network (MTU-Net) is proposed and applied to single-channel speech enhancement (SE). The proposed MTU-based SE method estimates an ideal binary mask (IBM) or an ideal ratio mask (IRM) by extending the decoding network of a conventional U-Net to simultaneously model the speech and noise spectra as the target. The effectiveness of the proposed SE method was evaluated under both matched and mismatched noise conditions between training and testing by measuring the perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI). Consequently, the proposed SE method with IRM achieved a substantial improvement with higher average PESQ scores by 0.17, 0.52, and 0.40 than other state-of-the-art deep-learning-based methods, such as the deep recurrent neural network (DRNN), SE generative adversarial network (SEGAN), and conventional U-Net, respectively. In addition, the STOI scores of the proposed SE method are 0.07, 0.05, and 0.05 higher than those of the DRNN, SEGAN, and U-Net, respectively. Next, voice activity detection (VAD) is also proposed by using the IRM estimated by the proposed MTU-Net-based SE method, which is fundamentally an unsupervised method without any model training. Then, the performance of the proposed VAD method was compared with the performance of supervised learning-based methods using a deep neural network (DNN), a boosted DNN, and a long short-term memory (LSTM) network. Consequently, the proposed VAD methods show a slightly better performance than the three neural network-based methods under mismatched noise conditions.

show abstract

Artificial Bandwidth Extension of Narrowband Speech Signals for the Improvement of Perceptual Speech Communication Quality

Park

Lee

Kim

2011

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hong Kook Kim

Acoustic model adaptation based on pronunciation variability analysis for non-native speech recognition

A Long Short-Term Memory (LSTM) Network for Hourly Estimation of PM2.5 Concentration in Two Cities of South Korea

Personalized HRTF Modeling Based on Deep Neural Network Using Anthropometric Measurements and Images of the Ear

Deep-Learning-Based Detection of Infants with Autism Spectrum Disorder Using Auto-Encoder Feature Representation

Speech recognition using quantized LSP parameters and their transformations in digital communication

Dysarthric Speech Recognition Error Correction Using Weighted Finite State Transducers Based on Context–Dependent Pronunciation Variation

Multi-Task Learning U-Net for Single-Channel Speech Enhancement and Mask-Based Voice Activity Detection

Artificial Bandwidth Extension of Narrowband Speech Signals for the Improvement of Perceptual Speech Communication Quality

Contact Info

Product

Resources

About