2020
DOI: 10.1186/s13636-020-00184-2
|View full text |Cite
|
Sign up to set email alerts
|

DOANet: a deep dilated convolutional neural network approach for search and rescue with drone-embedded sound source localization

Abstract: Drone-embedded sound source localization (SSL) has interesting application perspective in challenging search and rescue scenarios due to bad lighting conditions or occlusions. However, the problem gets complicated by severe drone ego-noise that may result in negative signal-to-noise ratios in the recorded microphone signals. In this paper, we present our work on drone-embedded SSL using recordings from an 8-channel cube-shaped microphone array embedded in an unmanned aerial vehicle (UAV). We use angular spectr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 23 publications
0
4
0
Order By: Relevance
“…The input STFT is calculated using a window length of 2048 and a half overlap. The 10 dilated layers aggregate information across the frequency dimension: they have kernel size (3, 1) and dilation factor 2(d − 1), where d ∈ [1,10] denotes the depth of the layer. The 3 non-dilated layers aggregate information across both time and frequency dimensions, with a kernel size (3,3).…”
Section: A Smolnetmentioning
confidence: 99%
“…The input STFT is calculated using a window length of 2048 and a half overlap. The 10 dilated layers aggregate information across the frequency dimension: they have kernel size (3, 1) and dilation factor 2(d − 1), where d ∈ [1,10] denotes the depth of the layer. The 3 non-dilated layers aggregate information across both time and frequency dimensions, with a kernel size (3,3).…”
Section: A Smolnetmentioning
confidence: 99%
“…(22) The bases now hold N/4 + 1 elements (instead of N/2 + 1) and all the elements of p(k) are purely real or imaginary numbers. Computing (22) involves K(N/2 + 2) real multiplications and KN/2 real additions, for a total of K(N +2) flops. Computing the vectors x add (t) and x sub (t) also adds N flops, which leads to a total of K(N + 2) + N flops.…”
Section: Generalized Cross-correlationmentioning
confidence: 99%
“…DoA can also solve the permutation ambiguity in speech separation tasks [12] with multiple microphones, in deep clustering for instance [13], [14], [15] or time-frequency masking [16], [17], [18]. SSL can also serve numerous applications in robotics [19], ranging from acoustic synchronous localization and mapping (SLAM) [20], rescue missions [21], [22], drones tracking [23], [24], [25] and assisting humans with hearing impairments [26]. Some frameworks have been proposed over the years to perform online SSL on robots [27], [28].…”
Section: Introductionmentioning
confidence: 99%
“…DNN sound enhancement is typically a pre-processing step for traditional source localization algorithms [36]. While a DNN can also be trained to predict the location of the sound source directly from the multi-channel microphone signal, the performance typically drops significantly in low-SNR scenarios [37].…”
Section: Introductionmentioning
confidence: 99%