DMMAN: A two-stage audio–visual fusion framework for sound separation and event localization

Hu, Ruihan; Zhou, Songbing; Tao, Zhi; Chang, Sheng; Huang, Qijun; Liu, Yisen; Han, Wei; Wu, Edmond Q.

doi:10.1016/j.neunet.2020.10.003

Cited by 14 publications

(3 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such an addition could introduce new multi-modal possibilities for improvements in detection, localisation and classification. This is similar to the DMMAN network described by Hu et al 66 , which would not only improve the performance of ORCA-SPY, but would also help with target differentiation for context dependent analysis with towed and stationary observation. ORCA-SPY generalizes in a way that it allows researchers to simulate and verify various array geometries and setups under assumed realistic real-world noise conditions, which is not just important in the field, but also in preparation for any fieldwork studies.…”

Section: Discussionmentioning

confidence: 70%

ORCA-SPY enables killer whale sound source simulation, detection, classification and localization using an integrated deep learning-based segmentation

Hauer,

Nöth,

Barnhill

et al. 2023

Sci Rep

View full text Add to dashboard Cite

Acoustic identification of vocalizing individuals opens up new and deeper insights into animal communications, such as individual-/group-specific dialects, turn-taking events, and dialogs. However, establishing an association between an individual animal and its emitted signal is usually non-trivial, especially for animals underwater. Consequently, a collection of marine species-, array-, and position-specific ground truth localization data is extremely challenging, which strongly limits possibilities to evaluate localization methods beforehand or at all. This study presents ORCA-SPY, a fully-automated sound source simulation, classification and localization framework for passive killer whale (Orcinus orca) acoustic monitoring that is embedded into PAMGuard, a widely used bioacoustic software toolkit. ORCA-SPY enables array- and position-specific multichannel audio stream generation to simulate real-world ground truth killer whale localization data and provides a hybrid sound source identification approach integrating ANIMAL-SPOT, a state-of-the-art deep learning-based orca detection network, followed by downstream Time-Difference-Of-Arrival localization. ORCA-SPY was evaluated on simulated multichannel underwater audio streams including various killer whale vocalization events within a large-scale experimental setup benefiting from previous real-world fieldwork experience. Across all 58,320 embedded vocalizing killer whale events, subject to various hydrophone array geometries, call types, distances, and noise conditions responsible for a signal-to-noise ratio varying from $$-14.2$$ - 14.2 dB to 3 dB, a detection rate of 94.0 % was achieved with an average localization error of 7.01$$^\circ$$ ∘ . ORCA-SPY was field-tested on Lake Stechlin in Brandenburg Germany under laboratory conditions with a focus on localization. During the field test, 3889 localization events were observed with an average error of 29.19$$^\circ$$ ∘ and a median error of 17.54$$^\circ$$ ∘ . ORCA-SPY was deployed successfully during the DeepAL fieldwork 2022 expedition (DLFW22) in Northern British Columbia, with a mean average error of 20.01$$^\circ$$ ∘ and a median error of 11.01$$^\circ$$ ∘ across 503 localization events. ORCA-SPY is an open-source and publicly available software framework, which can be adapted to various recording conditions as well as animal species.

show abstract

Section: Discussionmentioning

confidence: 70%

ORCA-SPY enables killer whale sound source simulation, detection, classification and localization using an integrated deep learning-based segmentation

Hauer,

Nöth,

Barnhill

et al. 2023

Sci Rep

View full text Add to dashboard Cite

show abstract

“…Eq. (5) shows that number of units in the convolution layer is defined as the half size for full connection for each layer. Through several levels of the cascade architecture, the fusion feature a f t finally passes through the convolution layer as the output layer to calculate the predicted density map pred D .…”

Section: Multi-modal Fusion Modulementioning

confidence: 99%

“…Crowd counting is taken as the computer-vision task, which is used in various fields such as intelligent transportation [1], industrial manufacturing [2] and security systems [3]. Different from the other computer vision tasks such as image classification [4] and scene understanding [5] and so on, the crowd counting models equipped by the convolutional neural network (CNN) should recognize arbitrarily sized people in various situations, including scenes with the extreme conditions such as high-level noise, low-level illumination and high-level occlusion. Consequently, the performance of the vision-driven model can be easily broken and maybe not very appropriate to deal with the crowd counting problem under extreme conditions.…”

Section: Introductionmentioning

confidence: 99%

AVMSN: An Audio-Visual Two Stream Crowd Counting Framework Under Low-Quality Conditions

Xie

et al. 2021

IEEE Access

Self Cite

View full text Add to dashboard Cite

RDC-SAL: Refine distance compensating with quantum scale-aware learning for crowd counting and localization

Tang

et al. 2022

Appl Intell

View full text Add to dashboard Cite

DMMAN: A two-stage audio–visual fusion framework for sound separation and event localization

Cited by 14 publications

References 26 publications

ORCA-SPY enables killer whale sound source simulation, detection, classification and localization using an integrated deep learning-based segmentation

ORCA-SPY enables killer whale sound source simulation, detection, classification and localization using an integrated deep learning-based segmentation

AVMSN: An Audio-Visual Two Stream Crowd Counting Framework Under Low-Quality Conditions

RDC-SAL: Refine distance compensating with quantum scale-aware learning for crowd counting and localization

Contact Info

Product

Resources

About