2019
DOI: 10.3390/s19143056
|View full text |Cite
|
Sign up to set email alerts
|

Dual Microphone Voice Activity Detection Based on Reliable Spatial Cues

Abstract: Two main spatial cues that can be exploited for dual microphone voice activity detection (VAD) are the interchannel time difference (ITD) and the interchannel level difference (ILD). While both ITD and ILD provide information on the location of audio sources, they may be impaired in different manners by background noises and reverberation and therefore can have complementary information. Conventional approaches utilize the statistics from all frequencies with fixed weight, although the information from some ti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 30 publications
0
2
0
Order By: Relevance
“…A major flaw in the language-based interface is that it is very sensitive to ambient noise, making it difficult to differentiate and classify signal to noise. To address this, a microphone voice activity detection (VAD) scheme [ 55 ] that enhances performance in a variety of noise environments in consideration of the sparsity of the speech signal in the time-frequency domain is proposed. And the language-based interface with the robot’s control is developed [ 56 ] and it reduced the ambient noise by 30%, the resulting inaccuracy has been improved.…”
Section: Biosignal-based Speech Recognitionmentioning
confidence: 99%
“…A major flaw in the language-based interface is that it is very sensitive to ambient noise, making it difficult to differentiate and classify signal to noise. To address this, a microphone voice activity detection (VAD) scheme [ 55 ] that enhances performance in a variety of noise environments in consideration of the sparsity of the speech signal in the time-frequency domain is proposed. And the language-based interface with the robot’s control is developed [ 56 ] and it reduced the ambient noise by 30%, the resulting inaccuracy has been improved.…”
Section: Biosignal-based Speech Recognitionmentioning
confidence: 99%
“…Spatial cues between multi-channel signals such as inter-channel time difference (or inter-channel phase difference) and inter-channel level difference can indicate the location of the speech source. These spatial characteristics have been shown to be particularly beneficial when combined with spectral characteristics over the frequency domain in several fields, such as source separation, speech enhancement, and voice activity detection [ 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 ]. Unfortunately, these spatial features are typically extracted in the frequency domain using STFT, making it difficult to integrate perfectly using the time domain method.…”
Section: Proposed Multi-channel Cross-tower With Attention Mechanimentioning
confidence: 99%