2022
DOI: 10.1609/aaai.v36i10.21394
|View full text |Cite
|
Sign up to set email alerts
|

Hybrid Neural Networks for On-Device Directional Hearing

Abstract: On-device directional hearing requires audio source separation from a given direction while achieving stringent human-imperceptible latency requirements. While neural nets can achieve significantly better performance than traditional beamformers, all existing models fall short of supporting low-latency causal inference on computationally-constrained wearables. We present DeepBeam, a hybrid model that combines traditional beamformers with a custom lightweight neural net. The former reduces the computational bur… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
10
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
1

Relationship

3
2

Authors

Journals

citations
Cited by 6 publications
(10 citation statements)
references
References 30 publications
0
10
0
Order By: Relevance
“…To ensure that the audio played through the headset is synced with the user's visual senses, we need this end-to-end latency to be less than 20-50 ms [24,59,67]. To achieve this, we need to reduce the buffer duration, the look-ahead duration and the processing time.…”
Section: System Requirementsmentioning
confidence: 99%
See 2 more Smart Citations
“…To ensure that the audio played through the headset is synced with the user's visual senses, we need this end-to-end latency to be less than 20-50 ms [24,59,67]. To achieve this, we need to reduce the buffer duration, the look-ahead duration and the processing time.…”
Section: System Requirementsmentioning
confidence: 99%
“…3) Real-time operation requires processing each acoustic block within the duration of the block itself. This means that it should take less than 10 ms to process a 10 ms buffer [67]. This can be challenging since neural networks are not known for their lightweight computation.…”
Section: System Requirementsmentioning
confidence: 99%
See 1 more Smart Citation
“…The algorithm must be robust to microphone position errors and work across different array shapes and sizes even in reverberant real-world environments. While prior work in deep learning proposed speech separation networks [18][19][20][21] , they did not achieve 2D localization. Recent work also explored distributed microphone arrays 1 .…”
Section: Speech Separation and 2d Localizationmentioning
confidence: 99%
“…More recent works tackle the problem of real-time directional hearing using eye trackers and wearable headsets. For example, [49] uses a hybrid network that combines signal processing with neural networks, but shows that their technique performs poorly in binaural scenarios (i.e., two microphones) and requires four or more microphones. In contrast, we focus on the problem of speech enhancement and create the first real-time end-to-end hardware-software neural-network based system using wireless synchronized earbuds.…”
Section: Related Workmentioning
confidence: 99%