Rafael Stahl scite author profile

This paper proposes a biologically inspired and technically implemented sound localization system to robustly estimate the position of a sound source in the frontal azimuthal half-plane. For localization, binaural cues are extracted using cochleagrams generated by a cochlear model that serve as input to the system. The basic idea of the model is to separately measure interaural time differences and interaural level differences for a number of frequencies and process these measurements as a whole. This leads to two-dimensional frequency versus time-delay representations of binaural cues, so-called activity maps. A probabilistic evaluation is presented to estimate the position of a sound source over time based on these activity maps. Learned reference maps for different azimuthal positions are integrated into the computation to gain time-dependent discrete conditional probabilities. At every timestep these probabilities are combined over frequencies and binaural cues to estimate the sound source position. In addition, they are propagated over time to improve position estimation. This leads to a system that is able to localize audible signals, for example human speech signals, even in reverberating environments.

show abstract

Fully Distributed Deep Learning Inference on Resource-Constrained Edge Devices

Stahl

Zhao

Mueller-Gritschneder

et al. 2019

View full text Add to dashboard Cite

DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices

Stahl

Hoffman

Mueller-Gritschneder

et al. 2021

Int J Parallel Prog

View full text Add to dashboard Cite

Performing inference of Convolutional Neural Networks (CNNs) on Internet of Things (IoT) edge devices ensures both privacy of input data and possible run time reductions when compared to a cloud solution. As most edge devices are memory- and compute-constrained, they cannot store and execute complex CNNs. Partitioning and distributing layer information across multiple edge devices to reduce the amount of computation and data on each device presents a solution to this problem. In this article, we propose DeeperThings, an approach that supports a full distribution of CNN inference tasks by partitioning fully-connected as well as both feature- and weight-intensive convolutional layers. Additionally, we jointly optimize memory, computation and communication demands. This is achieved using techniques to combine both feature and weight partitioning with a communication-aware layer fusion method, enabling holistic optimization across layers. For a given number of edge devices, the schemes are applied jointly using Integer Linear Programming (ILP) formulations to minimize data exchanged between devices, to optimize run times and to find the entire model’s minimal memory footprint. Experimental results from a real-world hardware setup running four different CNN models confirm that the scheme is able to evenly balance the memory footprint between devices. For six devices on 100 Mbit/s connections the integration of layer fusion additionally leads to a reduction of communication demands by up to 28.8%. This results in run time speed-up of the inference task by up to 1.52x compared to layer partitioning without fusing.

show abstract

Quickest detection of a tonal burst

Stahl

Willett

1997

IEEE Trans. Signal Process.

View full text Add to dashboard Cite

Driver Generation for IoT Nodes With Optimization of the Hardware/Software Interface

Stahl

Mueller-Gritschneder

Schlichtmann

2020

IEEE Embedded Syst. Lett.

View full text Add to dashboard Cite

Memory optimization for deep neural network (DNN) inference gains high relevance with the emergence of TinyML, which refers to the deployment of DNN inference tasks on tiny, low-power microcontrollers. Applications such as audio keyword detection or radar-based gesture recognition are heavily constrained by the limited memory on such tiny devices because DNN inference requires large intermediate run-time buffers to store activations and other intermediate data, which leads to high memory usage. In this paper, we propose a new Fused Depthwise Tiling (FDT) method for the memory optimization of DNNs, which, compared to existing tiling methods, reduces memory usage without inducing any run time overhead. FDT applies to a larger variety of network layers than existing tiling methods that focus on convolutions. It improves TinyML memory optimization significantly by reducing memory of models where this was not possible before and additionally providing alternative design points for models that show high run time overhead with existing methods. In order to identify the best tiling configuration, an end-to-end flow with a new path discovery method is proposed, which applies FDT and existing tiling methods in a fully automated way, including the scheduling of the operations and planning of the layout of buffers in memory. Out of seven evaluated models, FDT achieved significant memory reduction for two models by 76.2% and 18.1% where existing tiling methods could not be applied. Two other models showed a significant run time overhead with existing methods and FDT provided alternative design points with no overhead but reduced memory savings.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Rafael Stahl

A Probabilistic Model for Binaural Sound Localization

Fully Distributed Deep Learning Inference on Resource-Constrained Edge Devices

DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices

Quickest detection of a tonal burst

Driver Generation for IoT Nodes With Optimization of the Hardware/Software Interface

Contact Info

Product

Resources

About