Cascaded Tuning to Amplitude Modulation for Natural Sound Recognition

Koumura, Takuya; Terashima, Hiroki; Furukawa, Shigeto

doi:10.1523/jneurosci.2914-18.2019

Cited by 28 publications

(42 citation statements)

References 75 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Analogously, layer-wise correspondence has been found between CNNs trained for audio classification and the human auditory cortex 25 or the monkey peripheral auditory network 26 . Although all these studies are positive in the generality of explanatory capabilities of goal-optimized neural networks, the same story might not go all the way through.…”

Section: Discussionmentioning

confidence: 74%

CNN explains tuning properties of anterior, but not middle, face-processing areas in macaque IT

Raman¹,

Hosoya²

2019

Preprint

View full text Add to dashboard Cite

Recent computational studies have emphasized layer-wise quantitative similarity between convolutional neural networks (CNNs) and the primate visual ventral stream. However, whether such similarity holds for the face-selective areas, a subsystem of the higher visual cortex, is not clear. Here, we extensively investigate whether CNNs exhibit tuning properties as previously observed in different macaque face areas. While simulating four past experiments on a variety of CNN models, we sought for the model layer that quantitatively matches the multiple tuning properties of each face area. Our results show that higher model layers explain reasonably well the properties of anterior areas, while no layer simultaneously explains the properties of middle areas, consistently across the model variation. Thus, the CNNs may have some similarity with the primate face-processing system in the near-goal representation, but differ in the intermediate computational process, thus requiring a more comprehensive model for understanding the entire system.Recently, the neuroscience community has witnessed the rise of the deep convolution neural network (CNN) 1 , a family of feedforward artificial neural networks, in computational modeling of the primate visual system. CNN models trained for behavioral goals have exhibited remarkable similarity to ventral visual areas in terms of stimulus-response relationship despite that the network itself was not directly optimized to fit neural data. For

show abstract

Section: Discussionmentioning

confidence: 74%

CNN explains tuning properties of anterior, but not middle, face-processing areas in macaque IT

Raman¹,

Hosoya²

2019

Preprint

View full text Add to dashboard Cite

show abstract

“…As mentioned in Introduction, more recent studies have argued that CNNs trained for image classification have layers similar to higher 2-4,7 , intermediate [3][4][5] , or lower 6 areas in the monkey or human visual ventral stream. Analogously, layer-wise correspondence has been found between CNNs trained for audio classification and the human auditory cortex 25 or the monkey peripheral auditory network 26 . Although all these studies are positive in the generality of explanatory capabilities of goal-optimized neural networks, the same story might not go all the way through.…”

Section: View-identity Tuningmentioning

confidence: 71%

Convolutional neural networks explain tuning properties of anterior, but not middle, face-processing areas in macaque inferotemporal cortex

Raman

Hosoya

2020

Commun Biol

View full text Add to dashboard Cite

Recent computational studies have emphasized layer-wise quantitative similarity between convolutional neural networks (CNNs) and the primate visual ventral stream. However, whether such similarity holds for the face-selective areas, a subsystem of the higher visual cortex, is not clear. Here, we extensively investigate whether CNNs exhibit tuning properties as previously observed in different macaque face areas. While simulating four past experiments on a variety of CNN models, we sought for the model layer that quantitatively matches the multiple tuning properties of each face area. Our results show that higher model layers explain reasonably well the properties of anterior areas, while no layer simultaneously explains the properties of middle areas, consistently across the model variation. Thus, some similarity may exist between CNNs and the primate face-processing system in the near-goal representation, but much less clearly in the intermediate stages, thus requiring alternative modeling such as non-layer-wise correspondence or different computational principles.

show abstract

“…A very productive line of research put the emphasis on the temporal aspects of the speech structure and explored speech perception in terms of temporal-modulation processing (e.g., Houtgast and Steeneken, 1973;Plomp, 1983;Rosen, 1992;Drullman, 1995;Shannon et al, 1995;Zeng et al, 2005;Moore, 2008;Shamma and Lorenzi, 2013). Altogether, these studies demonstrated that (i) speech sounds convey salient modulations in amplitude (AM) and frequency (FM) resulting from the dynamic modulation of the vocal-tract geometric characteristics and vocal-fold vibrations (e.g., Varnet et al, 2017); (ii) the human auditory system is exquisitely sensitive to these modulation cues and certainly optimized to detect and discriminate modulation cues at the output of perceptual filters selectively tuned in the AM domain (Rodriguez et al, 2010;Koumura et al, 2019) and, in the case of slow FM carried by low-frequency sounds, due to temporal coding mechanisms using neural phase-locking to the temporal fine structure of narrowband signals at the output of cochlear filters (Paraouty et al, 2018); and (iii) the ability to identify speech in a variety of listening conditions is constrained by the ability to perceive accurately these relatively slow AM and FM components (e.g., Fu, 2002;Johannesen et al, 2016;Parthasarathy et al, 2020).…”

Section: Introductionmentioning

confidence: 88%

Characterizing amplitude and frequency modulation cues in natural soundscapes: A pilot study on four habitats of a biosphere reserve

Thoret

Varnet

Boubenec

et al. 2020

The Journal of the Acoustical Society of America

View full text Add to dashboard Cite

Natural soundscapes correspond to the acoustical patterns produced by biological and geophysical sound sources at different spatial and temporal scales for a given habitat. This pilot study aims to characterize the temporalmodulation information available to humans when perceiving variations in soundscapes within and across natural habitats. This is addressed by processing soundscapes from a previous study [Krause, Gage, and Joo. (2011). Landscape Ecol. 26, 1247] via models of human auditory processing extracting modulation at the output of cochlear filters. The soundscapes represent combinations of elevation, animal, and vegetation diversity in four habitats of the biosphere reserve in the Sequoia National Park (Sierra Nevada, USA). Bayesian statistical analysis and support vector machine classifiers indicate that: (i) amplitude-modulation (AM) and frequency-modulation (FM) spectra distinguish the soundscapes associated with each habitat; and (ii) for each habitat, diurnal and seasonal variations are associated with salient changes in AM and FM cues at rates between about 1 and 100 Hz in the low (<0.5 kHz) and high (>1-3 kHz) audio-frequency range. Support vector machine classifications further indicate that soundscape variations can be classified accurately based on these perceptually inspired representations.

show abstract

Cascaded Tuning to Amplitude Modulation for Natural Sound Recognition

Cited by 28 publications

References 75 publications

CNN explains tuning properties of anterior, but not middle, face-processing areas in macaque IT

CNN explains tuning properties of anterior, but not middle, face-processing areas in macaque IT

Convolutional neural networks explain tuning properties of anterior, but not middle, face-processing areas in macaque inferotemporal cortex

Characterizing amplitude and frequency modulation cues in natural soundscapes: A pilot study on four habitats of a biosphere reserve

Contact Info

Product

Resources

About