Speech is a signal that includes speaker's emotion, characteristic specification, phonemeinformation etc. Various methods have been proposed for speaker recognition by extracting specifications of a given utterance. Among them, short-term cepstral features are used excessively in speech, and speaker recognition areas because of their low complexity, and high performance in controlled environments. On the other hand, their performances decrease dramatically under degraded conditions such as channel mismatch, additive noise, emotional variability, etc. In this paper, a literature review on speaker-specific information extraction from speech is presented by considering the latest studies offering solutions to the aforementioned problem. The studies are categorized in three groups considering their robustness against channel mismatch, additive noise, and other degradations such as vocal effort, emotion mismatch, etc. For a more understandable representation, they are also classified into two tables by utilizing their classification methods, and used data-sets.
In this study, a new and rapid hidden resource decomposition method has been proposed to determine noisy pixels by adopting the extreme learning machines (ELM) method. The goal of this method is not only to determine noisy pixels, but also to protect critical structural information that can be used for disease diagnosis. In order to facilitate the diagnosis and also the treatment of patients in medicine, two-dimensional (2-D) images were calculated tomography (CT) which is obtained using medical imaging techniques. Utilizing a large number of CT images, promising results have been obtained from these experiments. The proposed method has shown a significant improvement on mean squared error and peak signal-to-noise ratio. The experimental results indicate that the proposed method is statistically efficient, and it has a good performance with a high learning speed. In the experiments, the results demonstrated that remarkable successive rates were obtained through the ELM method.
Brain tumors have been one of the most common life-threatening diseases for all mankind. There have been huge efforts dedicated to the development of medical imaging techniques and radiomics to diagnose tumor patients quickly and efficiently. One of the main aims is to ensure that pre-operative overall survival time (OS) prediction is accurate. Recently, deep learning (DL) algorithms, and particularly convolutional neural networks (CNNs) achieved promising performances in almost all computer vision fields. CNNs demand large training datasets and high computational costs. However, curating large annotated medical datasets are difficult and resource-intensive. The performances of single learners are also unsatisfactory for small datasets. Thus, this study was conducted to improve the performance of CNN models on small volumetric datasets through developing a DL-based ensemble method for OS classification of brain tumor patients using multi-modal magnetic resonance images (MRI). First, we proposed Multi-View CNNs: OS classifiers based on representing the 3D MRI data as a set of 2D slices along all three planes (axial, sagittal, and coronal) and process them using 2D CNNs. Subsequently, the predicted probabilities by the Multi-View CNN models were fused using standard machine learning algorithms. The proposed approach was experimentally evaluated on 163 patients obtained from the BraTS'17 training dataset. Our best model achieved an AUC and accuracy values of 0.93 and 92.9%, respectively, on classifying patients with brain tumors into two OS groups, outperforming current state-of-the-art results. In addition, the FLAIR MRI modality yielded the best classification accuracy compared to other MRI modalities. Similarly, axial projections had the best classification performance compared to coronal and sagittal projections. Our findings may provide valuable insights for physicians in advancing treatment planning via noninvasive and accurate prediction of survival using only MRIs at the time of diagnosis.
Robustness against background noise is a major research area for speech-related applications such as speech recognition and speaker recognition. One of the many solutions for this problem is to detect speech-dominant regions by using a voice activity detector (VAD). In this paper, a second-order polynomial regression-based algorithm is proposed with a similar function as a VAD for text-independent speaker verification systems. The proposed method aims to separate steady noise/silence regions, steady speech regions, and speech onset/offset regions. The regression is applied independently to each filter band of a mel spectrum, which makes the algorithm fit seamlessly to the conventional extraction process of the mel-frequency cepstral coefficients (MFCCs). The kmeans algorithm is also applied to estimate average noise energy in each band for spectral subtraction. A pseudo SNR-dependent linear thresholding for the final VAD output decision is introduced based on the k-means energy centers. This thresholding considers the speech presence in each band. Conventional VADs usually neglect the deteriorative effects of the additive noise in the speech regions. Contrary to this, the proposed method decides not only for the speech presence, but also if the frame is dominated by the speech, or the noise. Performance of the proposed algorithm is compared with a continuous noise tracking method, and another VAD method in speaker verification experiments, where five different noise types at five different SNR levels were considered. The proposed algorithm showed superior verification performance both with the conventional GMM-UBM method, and the stateof-the-art i-vector method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.