Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-0997
|View full text |Cite
|
Sign up to set email alerts
|

Dual Attention in Time and Frequency Domain for Voice Activity Detection

Abstract: Voice activity detection (VAD) is a challenging task in low signal-to-noise ratio (SNR) environment, especially in nonstationary noise. To deal with this issue, we propose a novel attention module that can be integrated in Long Short-Term Memory (LSTM). Our proposed attention module refines each LSTM layer's hidden states so as to make it possible to adaptively focus on in both time and frequency domain. Experiments are conducted on various noisy conditions of Aurora 4. Our proposed method obtains the 95.58 % … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 27 publications
(33 reference statements)
0
4
0
Order By: Relevance
“…A voice has frequency and amplitude. A frequency is the number of occurrence of repeating waveform per unit of time [16]. An amplitude is the maximum distance or displacement from the center of vibration when repeating vibrations occur.…”
Section: Trends Of Voice Frequency Analysis Technologymentioning
confidence: 99%
“…A voice has frequency and amplitude. A frequency is the number of occurrence of repeating waveform per unit of time [16]. An amplitude is the maximum distance or displacement from the center of vibration when repeating vibrations occur.…”
Section: Trends Of Voice Frequency Analysis Technologymentioning
confidence: 99%
“…Although E2E ASR focusing on feature extrac-Czech English French German Japanese Spanish tion in the frequency direction has been proposed, there are few examples of research on ASR models that apply an attention mechanism in the frequency direction. However, simultaneous temporal and frequency-directional attention mechanisms have been proposed in voice activity detection (VAD) [10] and speech enhancement processing [11].…”
Section: Introductionmentioning
confidence: 99%
“…Voice activity detection (VAD) is a technique to classify an acoustic segment into speech or non-speech, which is an important frontend step in a wide range of tasks such as speaker verification [1,2], emotion estimation [3], and automatic speech recognition [4]. Although many strategies have been proposed for VAD such as time-domain-energy-based methods and likelihood-ratio-based methods [5][6][7][8], fully neural network based methods have shown promising performance even under low signal-to-noise ratio (SNR) environments [9][10][11][12][13][14][15][16].…”
Section: Introductionmentioning
confidence: 99%