ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9415045
|View full text |Cite
|
Sign up to set email alerts
|

Robust Voice Activity Detection Using a Masked Auditory Encoder Based Convolutional Neural Network

Abstract: Voice activity detection (VAD) based on deep learning has achieved remarkable success. However, when the traditional features (e.g., raw waveforms and MFCCs) are directly fed to the deep neural network model, the performance decreases because of noise interference. Here, we propose a robust VAD approach using a masked auditory encoder based convolutional neural network (M-AECNN). First, we analyze the effectiveness of using auditory features as deep learning encoder. These features can roughly simulate the tra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 20 publications
0
2
0
Order By: Relevance
“…CNN has been applied to research in various fields with different case studies of voice detections. This research includes voice detection to classify voices [22], [23], detect depressive or emotional voices [24], [25], detect music [26], realize voice-based security systems [27], [28], detect language phonemes [29], [30], detect disease by sounds [31]- [34], and identify animal voices [35], [36]. There are many other research case studies that can be found and used as ongoing future research.…”
Section: A Transfer Learning Modelmentioning
confidence: 99%
“…CNN has been applied to research in various fields with different case studies of voice detections. This research includes voice detection to classify voices [22], [23], detect depressive or emotional voices [24], [25], detect music [26], realize voice-based security systems [27], [28], detect language phonemes [29], [30], detect disease by sounds [31]- [34], and identify animal voices [35], [36]. There are many other research case studies that can be found and used as ongoing future research.…”
Section: A Transfer Learning Modelmentioning
confidence: 99%
“…Early VAD research works were conducted based on statistical models [1,2]. Recently, deep learning-based VAD models leveraging various neural network models have been actively explored, including convolutional neural networks (CNN) [3][4][5][6][7], recurrent neural networks (RNN), and long short-term memory (LSTM) networks [8][9][10][11]. In particular, Jia et al and Kopuklu et al [4,5] proposed compact VAD models suitable for use in a limited hardware resource environment.…”
Section: Introductionmentioning
confidence: 99%