Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data

Pervaiz, Ayesha; Hussain, Fawad; Israr, Huma; Tahir, Muhammad; Raja, Fawad Riasat; Baloch, Naveed Khan; Ishmanov, Farruh; Zikria, Yousaf Bin

doi:10.3390/s20082326

Cited by 44 publications

(22 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Sistemas de KWS têm como objetivo a identificação automática de palavras-chave, operando de maneira online em streaming de áudio ou de forma offline em arquivos de áudio [3], [4]. Preferencialmente, ambos os modos de operação devem proporcionar alta acurácia de reconhecimento, apresentando desempenhos robustos em aplicações práticas sujeitas a cenários acústicos com baixa SNR (para detalhes, veja [7]). Tipicamente, sistemas de KWS do estado-da-arte podem ser divididos em dois blocos principais: front-end e back-end [6].…”

Section: Sistemas De Detecção De Palavras-chaveunclassified

“…Em resumo, uma ResNet pode ser vista como um conjunto de CNNs empilhadas sequencialmente, em que cada conjunto é constituído por duas camadas convolucionais em série possuindo uma conexão de atalho que liga diretamente a entrada com a saída desses conjuntos. Tais conjuntos são comumente denominados blocos residuais [6], [7]. Particularmente, neste trabalho de pesquisa, cada bloco residual consiste de duas camadas convolucionais com 45 filtros de convolução, de dimensão 3×3 (3×3 conv, 45), seguidas por uma função de ativação de unidade linear retificada (rectified linear unit -ReLU) e uma camada de normalização em lote (batch normalization -BN).…”

Section: B Arquiteturas De Sistemas De Kwsunclassified

“…Os sistemas atuais de reconhecimento automático de fala (automatic speech recognition -ASR) têm exibido desempenho satisfatório em cenários acústicos com níveis de ruído controlados, contudo, em ambientes com baixa razão sinalruído (signal-to-noise ratio -SNR), a operação desses sistemas se torna severamente prejudicada [6]. Nesse contexto, apesar de a robustez ao ruído ainda ser um problema crítico em aplicações do mundo real, a maioria dos trabalhos de pesquisa do estado-da-arte em KWS não tem levado em consideração (de forma eficaz) os efeitos do ruído [7], [8].…”

Section: Introductionunclassified

See 2 more Smart Citations

Estratégias de Combinação de Espectrogramas de Magnitude e de Fase Aplicadas em Sistemas Robustos de Detecção de Palavras-Chave

Silva¹,

Seara²

2021

Anais Do XXXIX Simpósio Brasileiro De Telecomunicações E Processamento De Sinais

View full text Add to dashboard Cite

Resumo-A demanda por sistemas de detecção de palavraschave (keyword spotting -KWS) vem crescendo consideravelmente para as mais diversas aplicações do mundo real. No entanto, o desempenho desses sistemas é fortemente degradado em condições de operação com baixa razão sinal-ruído (signal-to-noise ratio -SNR). Visando a obtenção de sistemas de KWS robustos ao ruído, este trabalho de pesquisa investiga o processo de extração de atributos nesses sistemas. Particularmente, o presente trabalho propõe o uso de estratégias de combinação de atributos considerando os espectrogramas de magnitude e de fase dos sinais de fala. Dessa forma, sistemas de KWS utilizando extração de atributos considerando a combinação da magnitude e da fase são contrastados com aqueles que utilizam apenas espectrogramas de magnitude. Resultados de simulação numérica são apresentados e avaliados com vistas à acurácia de reconhecimento de palavraschave, confirmando a eficácia das estratégias utilizadas neste trabalho.Palavras-Chave-Comitê de classificadores, detecção de palavras-chave, espectrogramas do sinal de fase, extração de atributos.

show abstract

Section: Sistemas De Detecção De Palavras-chaveunclassified

Section: B Arquiteturas De Sistemas De Kwsunclassified

Section: Introductionunclassified

See 1 more Smart Citation

Estratégias de Combinação de Espectrogramas de Magnitude e de Fase Aplicadas em Sistemas Robustos de Detecção de Palavras-Chave

Silva¹,

Seara²

2021

Anais Do XXXIX Simpósio Brasileiro De Telecomunicações E Processamento De Sinais

View full text Add to dashboard Cite

show abstract

“…The End-to-End noisy speech recognition using Fourier and Hilbert spectrum features [4] has been improved the noise robustness by adding components to the recognition system. The incorporating noise robustness in speech command recognition by noise augmentation of training data is presented [5]. This work thoroughly analyses the latest trends in speech recognition and evaluates the speech command dataset on different machine learning-based and deep learning-based techniques.…”

Section: Introductionmentioning

confidence: 99%

Enhanced Feature Extraction Based on Absolute Sort Delta Mean Algorithm and MFCC for Noise Robustness Speech Recognition

Nosan¹,

Sitjongsataporn²

2021

IJIES

View full text Add to dashboard Cite

In this paper, a proposed absolute sort delta mean (ASDM) method obtaining the speech feature extraction for noise robustness is developed from mel-frequency cepstral coefficients (MFCC) named ASDM-MFCC, in order to increase robustness against the different types of environmental noises. This method is used to suppress the noise effects by finding a rearranging average of power spectrum magnitude combined with triangular bandpass filtering. Firstly, the spectral power magnitudes are sorted in each frequency band of the speech signal. Secondly, the absolutedelta values are arranged and then a mean value is determined in the last step. The purpose of proposed ASDM-MFCC algorithm is to require the noise robustness of the feature vector extracted from the speech signal with the characteristic coefficients. The NOIZEUS noisy speech corpus dataset is used to evaluate the performance of proposed ASDM-MFCC algorithm by Euclidean distance method with the low computation complexity. Experimental results show that the proposed method can provide significantly the improvement in terms of accuracy at low signal to noise ratio (SNR). In the case of car and station at SNR=5dB, the proposed approach can outperform in comparison with the conventional MFCC and gammatone frequency cepstral coefficient (GFCC) by 80% and 76.67%, respectively. Obviously, some experimental results of the proposed ASDM-MFCC algorithm are more robust than the traditional one.

show abstract

“…The filter or kernel of the neural network is pruned using the filter clustering method proposed in this study that improves the processing speed while maintaining the abnormality detection performance. In addition, we used a convolutional neural network (CNN) based deep learning structure in this study because it guarantees an effective abnormality detection performance even in various noisy environments [23][24][25]. The remainder of this paper is organized as follows.…”

mentioning

confidence: 99%

Field-Applicable Pig Anomaly Detection System Using Vocalization for Embedded Board Implementations

et al. 2020

View full text Add to dashboard Cite

Failure to quickly and accurately detect abnormal situations, such as the occurrence of infectious diseases, in pig farms can cause significant damage to the pig farms and the pig farming industry of the country. In this study, we propose an economical and lightweight sound-based pig anomaly detection system that can be applicable even in small-scale farms. The system consists of a pipeline structure, starting from sound acquisition to abnormal situation detection, and can be installed and operated in an actual pig farm. It has the following structure that makes it executable on the embedded board TX-2: (1) A module that collects sound signals; (2) A noise-robust preprocessing module that detects sound regions from signals and converts them into spectrograms; and (3) A pig anomaly detection module based on MnasNet, a lightweight deep learning method, to which the 8-bit filter clustering method proposed in this study is applied, reducing its size by 76.3% while maintaining its identification performance. The proposed system recorded an F1-score of 0.947 as a stable pig’s abnormality identification performance, even in various noisy pigpen environments, and the system’s execution time allowed it to perform in real time.

show abstract

Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data

Cited by 44 publications

References 41 publications

Estratégias de Combinação de Espectrogramas de Magnitude e de Fase Aplicadas em Sistemas Robustos de Detecção de Palavras-Chave

Estratégias de Combinação de Espectrogramas de Magnitude e de Fase Aplicadas em Sistemas Robustos de Detecção de Palavras-Chave

Enhanced Feature Extraction Based on Absolute Sort Delta Mean Algorithm and MFCC for Noise Robustness Speech Recognition

Field-Applicable Pig Anomaly Detection System Using Vocalization for Embedded Board Implementations

Contact Info

Product

Resources

About