Wase: Learning When to Attend for Speaker Extraction in Cocktail Party Environments

Hao, Yunzhe; Xu, Jiaming; Zhang, Peng; Xu, Bo

doi:10.1109/icassp39728.2021.9413411

Cited by 9 publications

(1 citation statement)

References 29 publications

(26 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the cocktail party effect, many effective end-to-end neural network models have been proposed (Ephrat et al, 2018;Chao et al, 2019;Hao et al, 2021;Wang et al, 2021). However, the analysis of why these networks work is very difficult since the functional structures in these black-box models are very dense without clear function diversity.…”

Section: Related Workmentioning

confidence: 99%

Explaining cocktail party effect and McGurk effect with a spiking neural network improved by Motif-topology

et al. 2023

Self Cite

View full text Add to dashboard Cite

Network architectures and learning principles have been critical in developing complex cognitive capabilities in artificial neural networks (ANNs). Spiking neural networks (SNNs) are a subset of ANNs that incorporate additional biological features such as dynamic spiking neurons, biologically specified architectures, and efficient and useful paradigms. Here we focus more on network architectures in SNNs, such as the meta operator called 3-node network motifs, which is borrowed from the biological network. We proposed a Motif-topology improved SNN (M-SNN), which is further verified efficient in explaining key cognitive phenomenon such as the cocktail party effect (a typical noise-robust speech-recognition task) and McGurk effect (a typical multi-sensory integration task). For M-SNN, the Motif topology is obtained by integrating the spatial and temporal motifs. These spatial and temporal motifs are first generated from the pre-training of spatial (e.g., MNIST) and temporal (e.g., TIDigits) datasets, respectively, and then applied to the previously introduced two cognitive effect tasks. The experimental results showed a lower computational cost and higher accuracy and a better explanation of some key phenomena of these two effects, such as new concept generation and anti-background noise. This mesoscale network motifs topology has much room for the future.

show abstract

Section: Related Workmentioning

confidence: 99%