Deep Learning for Audio Signal Processing

Purwins, H.‐G.; Li, Bo; Virtanen, Tuomas; Schlüter, Jan; Chang, Shuo-Yiin; Sainath, Tara N.

doi:10.1109/jstsp.2019.2908700

Cited by 572 publications

(322 citation statements)

References 118 publications

Supporting

Mentioning

304

Contrasting

Unclassified

Order By: Relevance

“…It has made a remarkable impact in computer vision performance previously unattainable on many tasks such as image classification and object detection. Deep learning is applied in research concerning graphical modeling, pattern recognition, signal processing [1], computer vision [2], speech recognition [3], language recognition [4,5], audio recognition [6], and face recognition (FR) [7]. In biometrics, deep learning can be used to represent the unique biometric data and make improvements in the performance of many authentication and recognition systems.…”

Section: Introductionmentioning

confidence: 99%

Deep Convolutional Neural Network-Based Approaches for Face Recognition

2019

View full text Add to dashboard Cite

Face recognition (FR) is defined as the process through which people are identified using facial images. This technology is applied broadly in biometrics, security information, accessing controlled areas, keeping of the law by different enforcement bodies, smart cards, and surveillance technology. The facial recognition system is built using two steps. The first step is a process through which the facial features are picked up or extracted, and the second step is pattern classification. Deep learning, specifically the convolutional neural network (CNN), has recently made commendable progress in FR technology. This paper investigates the performance of the pre-trained CNN with multi-class support vector machine (SVM) classifier and the performance of transfer learning using the AlexNet model to perform classification. The study considers CNN architecture, which has so far recorded the best outcome in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) in the past years, more specifically, AlexNet and ResNet-50. In order to determine performance optimization of the CNN algorithm, recognition accuracy was used as a determinant. Improved classification rates were seen in the comprehensive experiments that were completed on the various datasets of ORL, GTAV face, Georgia Tech face, labelled faces in the wild (LFW), frontalized labeled faces in the wild (F_LFW), YouTube face, and FEI faces. The result showed that our model achieved a higher accuracy compared to most of the state-of-the-art models. An accuracy range of 94% to 100% for models with all databases was obtained. Also, this was obtained with an improvement in recognition accuracy up to 39%.

show abstract

Section: Introductionmentioning

confidence: 99%

Deep Convolutional Neural Network-Based Approaches for Face Recognition

2019

View full text Add to dashboard Cite

show abstract

“…Hidden Markov Models) pod koniec lat 80. XX wieku [2,42] i zastosowanie głębokich sieci neuronowych DNN, począwszy od około 2005 r. [41]. Początkowo sieci te stosowano do klasyfikacji pojedynczych ramek sygnału w terminach podfonemów, pozostawiając modelom HMM zadanie rozpoznawania sekwencji obserwacji, co przyjmowało postać hybrydowego rozwiązania DNN-HMM.…”

Section: Rozpoznawanie Mowyunclassified

“…W drugiej metodzie prowadzona jest najpierw kolejna dekompozycja na czynniki, tym razem przestrzeni i-wektorów, modelująca zakłócenia i klasyfikacja stosująca zaawansowane stochastyczne miary odległości. − Głębokie sieci neuronowe znalazły również zastosowanie do rozpoznawania mówców [8,41]. Prace badawcze dotyczą wykorzystania tych sieci do modelowania mówcy -znajdywania w procesie uczenia nieliniowego przekształcenia cech zastępującego model mieszanin Gaussa -a także na etapie dopasowania obserwacji z modelem [28].…”

Section: Rozpoznawanie Mówcyunclassified

Agent Structure of Multimodal User Interface to the National Cybersecurity Platform – Part 2

Kasprzak

Szynkiewicz

Stefańczyk

et al. 2019

PAR

View full text Add to dashboard Cite

Zezwala się na korzystanie z artykułu na warunkach licencji Creative Commons Uznanie autorstwa 3.0 1. Wprowadzenie 1.1. Wielomodalny interfejs człowiek-komputer Prace badawcze poświęcone wielomodalnym interfejsom człowiek-komputer są prowadzone od ponad 40 lat [37]. Celem tych badań jest opracowanie metod i technik interakcji ludzi z komputerem w pełni wykorzystujących sposoby naturalnej komunikacji i interakcji człowieka z otoczeniem. Interfejsy wielomodalne charakteryzują się dwiema podstawowymi cechami: łączą wiele typów danych oraz przetwarzają te dane w czasie rzeczywistym przy określonych ograniczeniach czasowych [10]. System "Put-That-There" [3] opracowany w MIT (USA) jest powszechnie uważany za pierwszy praktyczny pokaz możliwości, jakie daje wielomodalny interfejs. W systemie tym były łączone dwa rodzaje wejść: głosowe oraz gesty, które umożliwiały użytkownikowi siedzącemu na krześle naturalną

show abstract

“…Results are encouraging due to during the learning phase, an accuracy greater than 77% is achieved. In [25], the authors provide a review of the state-of-the-art deep learning techniques for audio signal processing. Analyzed works range from variants of the long short-term memory architecture, audio-specific neural network models, and also it includes convolution neural networks.…”

Section: Introductionmentioning

confidence: 99%

An Optimized Brain-Based Algorithm for Classifying Parkinson’s Disease

et al. 2020

View full text Add to dashboard Cite

During the last years, highly-recognized computational intelligence techniques have been proposed to treat classification problems. These automatic learning approaches lead to the most recent researches because they exhibit outstanding results. Nevertheless, to achieve this performance, artificial learning methods firstly require fine tuning of their parameters and then they need to work with the best-generated model. This process usually needs an expert user for supervising the algorithm’s performance. In this paper, we propose an optimized Extreme Learning Machine by using the Bat Algorithm, which boosts the training phase of the machine learning method to increase the accuracy, and decreasing or keeping the loss in the learning phase. To evaluate our proposal, we use the Parkinson’s Disease audio dataset taken from UCI Machine Learning Repository. Parkinson’s disease is a neurodegenerative disorder that affects over 10 million people. Although its diagnosis is through motor symptoms, it is possible to evidence the disorder through variations in the speech using machine learning techniques. Results suggest that using the bio-inspired optimization algorithm for adjusting the parameters of the Extreme Learning Machine is a real alternative for improving its performance. During the validation phase, the classification process for Parkinson’s Disease achieves a maximum accuracy of 96.74% and a minimum loss of 3.27%.

show abstract

Deep Learning for Audio Signal Processing

Cited by 572 publications

References 118 publications

Deep Convolutional Neural Network-Based Approaches for Face Recognition

Deep Convolutional Neural Network-Based Approaches for Face Recognition

Agent Structure of Multimodal User Interface to the National Cybersecurity Platform – Part 2

An Optimized Brain-Based Algorithm for Classifying Parkinson’s Disease

Contact Info

Product

Resources

About