The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-1928
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Level Adaptive Speech Activity Detector for Speech in Naturalistic Environments

Abstract: Speech activity detection (SAD) is a part of many speech processing applications. The traditional SAD approaches use signal energy as the evidence to identify the speech regions. However, such methods perform poorly under uncontrolled environments. In this work, we propose a novel SAD approach using a multi-level decision with signal knowledge in an adaptive manner. The multi-level evidence considered are modulation spectrum and smoothed Hilbert envelope of linear prediction (LP) residual. Modulation spectrum … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
8
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 14 publications
(8 citation statements)
references
References 34 publications
0
8
0
Order By: Relevance
“…Clearly, lower DCF indicates better classification performance. Two recent studies found DCF values of 7.4% (Sharma, Das, & Li, 2019) and 11.7% (Hansen, Joglekar, Shekhar, Kothapally, Yu, Kaushik, & Sangwan, 2019) (using a = 0.25 and b = 0.75) in S/VAD for recordings from the Apollo 11 space travel mission (Kaushik et al, 2018). Another study (Dubey, Sangwan, & Hansen, 2018) evaluated algorithms using a corpus of noisy recordings from degraded military communication channels and reported DCF values (also with a = 0.25, b = 0.75) ranging from 4.3% to 8.9% (mean = 6.1%) across five novel algorithms (averaging across degraded channel conditions), where this constituted comparable performance in relation to baseline algorithms (e.g., Sholokhov, Sahidullah, & Kinnunen, 2018).…”
Section: Discussionmentioning
confidence: 96%
“…Clearly, lower DCF indicates better classification performance. Two recent studies found DCF values of 7.4% (Sharma, Das, & Li, 2019) and 11.7% (Hansen, Joglekar, Shekhar, Kothapally, Yu, Kaushik, & Sangwan, 2019) (using a = 0.25 and b = 0.75) in S/VAD for recordings from the Apollo 11 space travel mission (Kaushik et al, 2018). Another study (Dubey, Sangwan, & Hansen, 2018) evaluated algorithms using a corpus of noisy recordings from degraded military communication channels and reported DCF values (also with a = 0.25, b = 0.75) ranging from 4.3% to 8.9% (mean = 6.1%) across five novel algorithms (averaging across degraded channel conditions), where this constituted comparable performance in relation to baseline algorithms (e.g., Sholokhov, Sahidullah, & Kinnunen, 2018).…”
Section: Discussionmentioning
confidence: 96%
“…While this was similar to the 116 system submissions received for FS-1 challenge, participation for both tracks of SD and ASR tasks was noticeably higher. The systems developed for Figure 4: rVad-SincNet based SID baseline system [34,35] FS-2 also exhibited vast improvements in performance compared to the best systems developed for FS-1 challenge [2,11,12,13,15], as seen in Table-5. We observed relative improvements of 67%, 57%, and 62% for SAD, Speaker Diarization from scratch, and Speech Recognition from audio streams tasks respectively.…”
Section: Discussionmentioning
confidence: 99%
“…This began with the Inaugural FEARLESS STEPS Challenge: Massive Naturalistic Audio (FS-1). The first edition of this challenge encouraged the development of core unsupervised/semi-supervised speech and language systems for single-channel data with low resource availability, serving as the First Step towards extracting high-level information from such massive unlabeled corpora [11,12,13,14,15]. As a natural progression following the successful inaugural FS-1 challenge, the FEARLESS STEPS Challenge Phase-2 (FS-2) focuses on the development of single-channel supervised learning strategies.…”
Section: Introductionmentioning
confidence: 99%
“…Sound Localization and Classification (SLC) refers to estimating the spatial location of a sound source and identifying the type of a sound event through a unified framework. A SLC method enables the autonomous robots to determine sound location and detect sound events for navigation and interaction with the surroundings [1,2]. Thus, SLC is useful in smart-city and smart-home applications to automatically specify social or human activities, and assist the hearing impaired to visualize and realize sounds [3,4,5,6].…”
Section: Introductionmentioning
confidence: 99%