Abstract:In the speaker extraction problem, it is found that additional information from the target speaker contributes to the tracking and extraction of the target speaker, which includes voiceprint, lip movement, facial expression, and spatial information. However, no one cares for the cue of sound onset, which has been emphasized in the auditory scene analysis and psychology. Inspired by it, we explicitly modeled the onset cue and verified the effectiveness in the speaker extraction task. We further extended to the … Show more
“…For the cocktail party effect, many effective end-to-end neural network models have been proposed (Ephrat et al, 2018;Chao et al, 2019;Hao et al, 2021;Wang et al, 2021). However, the analysis of why these networks work is very difficult since the functional structures in these black-box models are very dense without clear function diversity.…”
Network architectures and learning principles have been critical in developing complex cognitive capabilities in artificial neural networks (ANNs). Spiking neural networks (SNNs) are a subset of ANNs that incorporate additional biological features such as dynamic spiking neurons, biologically specified architectures, and efficient and useful paradigms. Here we focus more on network architectures in SNNs, such as the meta operator called 3-node network motifs, which is borrowed from the biological network. We proposed a Motif-topology improved SNN (M-SNN), which is further verified efficient in explaining key cognitive phenomenon such as the cocktail party effect (a typical noise-robust speech-recognition task) and McGurk effect (a typical multi-sensory integration task). For M-SNN, the Motif topology is obtained by integrating the spatial and temporal motifs. These spatial and temporal motifs are first generated from the pre-training of spatial (e.g., MNIST) and temporal (e.g., TIDigits) datasets, respectively, and then applied to the previously introduced two cognitive effect tasks. The experimental results showed a lower computational cost and higher accuracy and a better explanation of some key phenomena of these two effects, such as new concept generation and anti-background noise. This mesoscale network motifs topology has much room for the future.
“…For the cocktail party effect, many effective end-to-end neural network models have been proposed (Ephrat et al, 2018;Chao et al, 2019;Hao et al, 2021;Wang et al, 2021). However, the analysis of why these networks work is very difficult since the functional structures in these black-box models are very dense without clear function diversity.…”
Network architectures and learning principles have been critical in developing complex cognitive capabilities in artificial neural networks (ANNs). Spiking neural networks (SNNs) are a subset of ANNs that incorporate additional biological features such as dynamic spiking neurons, biologically specified architectures, and efficient and useful paradigms. Here we focus more on network architectures in SNNs, such as the meta operator called 3-node network motifs, which is borrowed from the biological network. We proposed a Motif-topology improved SNN (M-SNN), which is further verified efficient in explaining key cognitive phenomenon such as the cocktail party effect (a typical noise-robust speech-recognition task) and McGurk effect (a typical multi-sensory integration task). For M-SNN, the Motif topology is obtained by integrating the spatial and temporal motifs. These spatial and temporal motifs are first generated from the pre-training of spatial (e.g., MNIST) and temporal (e.g., TIDigits) datasets, respectively, and then applied to the previously introduced two cognitive effect tasks. The experimental results showed a lower computational cost and higher accuracy and a better explanation of some key phenomena of these two effects, such as new concept generation and anti-background noise. This mesoscale network motifs topology has much room for the future.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.