2016
DOI: 10.1007/978-3-319-45510-5_40
|View full text |Cite
|
Sign up to set email alerts
|

Voice Activity Detector (VAD) Based on Long-Term Mel Frequency Band Features

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
3
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 6 publications
0
3
0
Order By: Relevance
“…As the target of the VAD system is to remove as much information-shallow data from the audio data as possible, we compare several approaches here: at a first level, we try to filter for all vocalisations with general VAD systems, one specifically trained on our data set, the other one being an implementation of the WebRTC VAD system 3 (Google, 2021), commonly used as a comparison for other VAD systems, e.g., (Salishev et al, 2016;Nahar and Kai, 2020). The aggressiveness score of the WebRTC VAD is set equal to one.…”
Section: Voice Activity Detectionmentioning
confidence: 99%
“…As the target of the VAD system is to remove as much information-shallow data from the audio data as possible, we compare several approaches here: at a first level, we try to filter for all vocalisations with general VAD systems, one specifically trained on our data set, the other one being an implementation of the WebRTC VAD system 3 (Google, 2021), commonly used as a comparison for other VAD systems, e.g., (Salishev et al, 2016;Nahar and Kai, 2020). The aggressiveness score of the WebRTC VAD is set equal to one.…”
Section: Voice Activity Detectionmentioning
confidence: 99%
“…A VAD is commonly applied for speech and speaker recognition tasks, as well as for telephony, while VAD is more recently referenced in speaker diarization research [65]. Typically, VAD involves using statistical models and short-term energy-based features [66]. Some works in the reviewed literature briefly describe utilization of their own VAD involving on a threshold-based technique dependent on the dataset used [13,27,31,48].…”
Section: Voice Activity Detectionmentioning
confidence: 99%
“…The aggressiveness of the module can be set from a scale of 0-3, with 0 being the least aggressive at filtering out non-speech frames. The WebRTC VAD has been referenced in the work by Stoter et al for speaker counting [10] and in literature comparing VAD techniques [66]. This VAD module was experimented with as a Python library although the results were not substantial as the VAD was unable to detect the presence of non-speech from the data despite being regarded as a state-of-the-art module.…”
Section: Voice Activity Detectionmentioning
confidence: 99%
“…We used WebRTC [22] to perform A-VAD. WebRTC employs multiple-frequency (subband) features combined with a pre-trained GMM classifier.…”
Section: B Audio Voice Activity Detectionmentioning
confidence: 99%