“…As the input spectrogram is strided temporally by α, the intermediate feature maps have a lower temporal resolution. Moreover, the Slow stream has temporal convolutions only in res 4 1. Architecture details for Fig.…”
Section: Network Architecturementioning
confidence: 99%
“…There is strong evidence in neuroscience for the existence of two streams in the human auditory system, the ventral stream for identifying sound-emitting objects and the dorsal streams for locating these objects. Studies [3,4] suggest the ventral stream accordingly exhibits high spectral resolution for object identification, while the dorsal stream has a high temporal resolution and operates at a higher sampling rate.…”
We propose a two-stream convolutional network for audio recognition, that operates on time-frequency spectrogram inputs. Following similar success in visual recognition, we learn Slow-Fast auditory streams with separable convolutions and multi-level lateral connections. The Slow pathway has high channel capacity while the Fast pathway operates at a fine-grained temporal resolution. We showcase the importance of our two-stream proposal on two diverse datasets: VGG-Sound and EPIC-KITCHENS-100, and achieve stateof-the-art results on both.
“…As the input spectrogram is strided temporally by α, the intermediate feature maps have a lower temporal resolution. Moreover, the Slow stream has temporal convolutions only in res 4 1. Architecture details for Fig.…”
Section: Network Architecturementioning
confidence: 99%
“…There is strong evidence in neuroscience for the existence of two streams in the human auditory system, the ventral stream for identifying sound-emitting objects and the dorsal streams for locating these objects. Studies [3,4] suggest the ventral stream accordingly exhibits high spectral resolution for object identification, while the dorsal stream has a high temporal resolution and operates at a higher sampling rate.…”
We propose a two-stream convolutional network for audio recognition, that operates on time-frequency spectrogram inputs. Following similar success in visual recognition, we learn Slow-Fast auditory streams with separable convolutions and multi-level lateral connections. The Slow pathway has high channel capacity while the Fast pathway operates at a fine-grained temporal resolution. We showcase the importance of our two-stream proposal on two diverse datasets: VGG-Sound and EPIC-KITCHENS-100, and achieve stateof-the-art results on both.
“…Such a model, based on an integro-differential equation, has been successfully applied to describe the evolution of neural activations. In particular, it allowed theoretically predicting complex perceptual phenomena in V1, such as the emergence of hallucinatory patterns [12,17], and has been used in various computational models of the auditory cortex [26,37,46]. Recently, these equations have been coupled with the neurogeometric model of V1 to great benefit.…”
Section: A Rotated Sound Image Corresponds To a Completely Different mentioning
The reconstruction mechanisms built by the human auditory system during sound reconstruction are still a matter of debate. The purpose of this study is to propose a mathematical model of sound reconstruction based on the functional architecture of the auditory cortex (A1). The model is inspired by the geometrical modelling of vision, which has undergone a great development in the last ten years. There are, however, fundamental dissimilarities, due to the different role played by time and the different group of symmetries. The algorithm transforms the degraded sound in an ‘image’ in the time–frequency domain via a short-time Fourier transform. Such an image is then lifted to the Heisenberg group and is reconstructed via a Wilson–Cowan integro-differential equation. Preliminary numerical experiments are provided, showing the good reconstruction properties of the algorithm on synthetic sounds concentrated around two frequencies.
“…The caudal or dorsal stream originates in areas located caudally to the primary core and projects via the parietal cortex to dorsal frontal regions (Scott et al, 2017). These two processing streams show distinct neuronal properties (Jasmin et al, 2019;Zulfiqar et al, 2020). Compared to primary and surrounding auditory areas, neurons in the rostral field exhibit longer response latencies and narrower frequency tuning (Recanzone et al, 2000;Tian et al, 2001;Bendor and Wang, 2008;Camalier et al, 2012).…”
Section: Information Processing Pathwaysmentioning
confidence: 99%
“…Spectro-temporal Processing in a Two-Stream Computational Model of Auditory Cortex Zulfiqar, I., Moerel, M., and Formisano, E. (2020)…”
People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.• The final author version and the galley proof are versions of the publication after peer review.• The final published version features the final layout of the paper including the volume, issue and page numbers.
Link to publication
General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal.If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the "Taverne" license above, please follow below link for the End User Agreement:
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.