This paper investigates several aspects of training a RNN (recurrent neural network) that impact the objective and subjective quality of enhanced speech for real-time single-channel speech enhancement. Specifically, we focus on a RNN that enhances short-time speech spectra on a single-frame-in, single-frame-out basis, a framework adopted by most classical signal processing methods. We propose two novel mean-squared-error-based learning objectives that enable separate control over the importance of speech distortion versus noise reduction. The proposed loss functions are evaluated by widely accepted objective quality and intelligibility measures and compared to other competitive online methods. In addition, we study the impact of feature normalization and varying batch sequence lengths on the objective quality of enhanced speech. Finally, we show subjective ratings for the proposed approach and a state-of-the-art real-time RNN-based method.
Abstract-There is an increasing demand for smart fogcomputing gateways as the size of cloud data is growing. This paper presents a Fog computing interface (FIT) for processing clinical speech data. FIT builds upon our previous work on EchoWear, a wearable technology that validated the use of smartwatches for collecting clinical speech data from patients with Parkinson's disease (PD). The fog interface is a low-power embedded system that acts as a smart interface between the smartwatch and the cloud. It collects, stores, and processes the speech data before sending speech features to secure cloud storage. We developed and validated a working prototype of FIT that enabled remote processing of clinical speech data to get speech clinical features such as loudness, short-time energy, zero-crossing rate, and spectral centroid. We used speech data from six patients with PD in their homes for validating FIT. Our results showed the efficacy of FIT as a Fog interface to translate the clinical speech processing chain (CLIP) from a cloud-based backend to a fog-based smart gateway.
The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. We recently organized a DNS challenge special session at INTERSPEECH and ICASSP 2020. We opensourced training and test datasets for the wideband scenario. We also open-sourced a subjective evaluation framework based on ITU-T standard P.808, which was also used to evaluate participants of the challenge. Many researchers from academia and industry made significant contributions to push the field forward, yet even the best noise suppressor was far from achieving superior speech quality in challenging scenarios. In this version of the challenge organized at INTERSPEECH 2021, we are expanding both our training and test datasets to accommodate full band scenarios. The two tracks in this challenge will focus on real-time denoising for (i) wide band, and (ii) full band scenarios. We are also making available a reliable nonintrusive objective speech quality metric for wide band called DNS-MOS for the participants to use during their development phase.
In today’s digital world healthcare is one core area of the medical domain. A healthcare system is required to analyze a large amount of patient data which helps to derive insights and assist the prediction of diseases. This system should be intelligent in order to predict a health condition by analyzing a patient’s lifestyle, physical health records and social activities. The health recommender system (HRS) is becoming an important platform for healthcare services. In this context, health intelligent systems have become indispensable tools in decision making processes in the healthcare sector. Their main objective is to ensure the availability of the valuable information at the right time by ensuring information quality, trustworthiness, authentication and privacy concerns. As people use social networks to understand their health condition, so the health recommender system is very important to derive outcomes such as recommending diagnoses, health insurance, clinical pathway-based treatment methods and alternative medicines based on the patient’s health profile. Recent research which targets the utilization of large volumes of medical data while combining multimodal data from disparate sources is discussed which reduces the workload and cost in health care. In the healthcare sector, big data analytics using recommender systems have an important role in terms of decision-making processes with respect to a patient’s health. This paper gives a proposed intelligent HRS using Restricted Boltzmann Machine (RBM)-Convolutional Neural Network (CNN) deep learning method, which provides an insight into how big data analytics can be used for the implementation of an effective health recommender engine, and illustrates an opportunity for the health care industry to transition from a traditional scenario to a more personalized paradigm in a tele-health environment. By considering Root Square Mean Error (RSME) and Mean Absolute Error (MAE) values, the proposed deep learning method (RBM-CNN) presents fewer errors compared to other approaches.
The increasing use of wearables in smart telehealth generates heterogeneous medical big data. Cloud and fog services process these data for assisting clinical procedures. IoT based ehealthcare have greatly benefited from efficient data processing. This paper proposed and evaluated use of low-resource machine learning on Fog devices kept close to the wearables for smart healthcare. In state-of-the-art telecare systems, the signal processing and machine learning modules are deployed in the cloud for processing physiological data. We developed a prototype of Fog-based unsupervised machine learning big data analysis for discovering patterns in physiological data. We employed Intel Edison and Raspberry Pi as Fog computer in proposed architecture. We performed validation studies on real-world pathological speech data from in-home monitoring of patients with Parkinson's disease (PD). Proposed architecture employed machine learning for analysis of pathological speech data obtained from smartwatches worn by the patients with PD. Results showed that proposed architecture is promising for low-resource clinical machine learning. It could be useful for other applications within wearable IoT for smart telehealth scenarios by translating machine learning approaches from the cloud backend to edge computing devices such as Fog.
No abstract
Abstract. In the era when the market segment of Internet of Things (IoT) tops the chart in various business reports, it is apparently envisioned that the field of medicine expects to gain a large benefit from the explosion of wearables and internet-connected sensors that surround us to acquire and communicate unprecedented data on symptoms, medication, food intake, and daily-life activities impacting one's health and wellness. However, IoT-driven healthcare would have to overcome many barriers, such as: 1) There is an increasing demand for data storage on cloud servers where the analysis of the medical big data becomes increasingly complex; 2) The data, when communicated, are vulnerable to security and privacy issues; 3) The communication of the continuously collected data is not only costly but also energy hungry; 4) Operating and maintaining the sensors directly from the cloud servers are non-trial tasks. This book chapter defined Fog Computing in the context of medical IoT. Conceptually, Fog Computing is a serviceoriented intermediate layer in IoT, providing the interfaces between the sensors and cloud servers for facilitating connectivity, data transfer, and queryable local database. The centerpiece of Fog computing is a low-power, intelligent, wireless, embedded computing node that carries out signal conditioning and data analytics on raw data collected from wearables or other medical sensors and offers efficient means to serve telehealth interventions. We implemented and tested an fog computing system using the Intel Edison and Raspberry Pi that allows acquisition, computing, storage and communication of the various medical data such as pathological speech data of individuals with speech disorders, Phonocardiogram (PCG) signal for heart rate estimation, and Electrocardiogram (ECG)-based Q, R, S detection. The book chapter ends with experiments and results showing how fog computing could lessen the obstacles of existing cloud-driven medical IoT solutions and enhance the overall performance of the system in terms of computing intelligence, transmission, storage, configurable, and security. The case studies on various types of physiological data shows that the proposed Fog architecture could be used for signal enhancement, processing and analysis of various types of bio-signals.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.