Depth in convolutional neural networks solves scene segmentation

Seijdel, Noor; Tsakmakidis, Nikos; Haan, E.H.F. de; Bohté, Sander M.; Scholte, H. Steven

doi:10.1371/journal.pcbi.1008022

Cited by 22 publications

(13 citation statements)

References 52 publications

(75 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Emergence of object category representations can be delayed, for example when objects are occluded or are hard to categorize [44][45][46] . This suggests that object category representations might emerge with a delay also when objects appear on cluttered backgrounds, for example because additional grouping and segmentation operations are necessary that depend on recurrence and hence require additional time [47][48][49] .…”

Section: Object Category Representations In Timementioning

confidence: 99%

The spatiotemporal neural dynamics of object location representations in the human brain

et al. 2022

View full text Add to dashboard Cite

To interact with objects in complex environments, we must know what they are and where they are in spite of challenging viewing conditions. Here, we investigated where, how and when representations of object location and category emerge in the human brain when objects appear on cluttered natural scene images using a combination of functional magnetic resonance imaging, electroencephalography and computational models. We found location representations to emerge along the ventral visual stream towards lateral occipital complex, mirrored by gradual emergence in deep neural networks. Time-resolved analysis suggested that computing object location representations involves recurrent processing in high-level visual cortex. Object category representations also emerged gradually along the ventral visual stream, with evidence for recurrent computations. These results resolve the spatiotemporal dynamics of the ventral visual stream that give rise to representations of where and what objects are present in a scene under challenging viewing conditions.

show abstract

Section: Object Category Representations In Timementioning

confidence: 99%

The spatiotemporal neural dynamics of object location representations in the human brain

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Recently, a multitude of studies have reconciled these seemingly inconsistent findings by indicating that recurrent processes might be employed adaptively, depending on the visual input: while feed-forward activity might suffice for simple scenes with isolated objects, more complex scenes or more challenging conditions (e.g. objects that are occluded or degraded), may need additional visual operations ('routines') requiring recurrent computations (Groen et al, 2018;Tang et al, 2018;Kar et al, 2019;Rajaei et al, 2019;Seijdel et al, 2020). For objects in isolation, or very simple scenes, rapid recognition may thus rely on a coarse and unsegmented feed-forward representation (Crouzet and Serre, 2011), while for more cluttered images recognition may require explicit encoding of spatial relationships between parts.…”

Section: Introductionmentioning

confidence: 99%

On the Necessity of Recurrent Processing during Object Recognition: It Depends on the Need for Scene Segmentation

et al. 2021

Self Cite

View full text Add to dashboard Cite

While feed-forward activity may suffice for recognizing objects in isolation, additional visual operations that aid object recognition might be needed for real-world scenes. One such additional operation is figure-ground segmentation; extracting the relevant features and locations of the target object while ignoring irrelevant features. In this study of 60 participants, we show objects on backgrounds of increasing complexity to investigate whether recurrent computations are increasingly important for segmenting objects from more complex backgrounds. Three lines of evidence show that recurrent processing is critical for recognition of objects embedded in complex scenes. First, behavioral results indicated a greater reduction in performance after masking objects presented on more complex backgrounds; with the degree of impairment increasing with increasing background complexity. Second, electroencephalography (EEG) measurements showed clear differences in the evoked response potentials (ERPs) between conditions around 200ms -a time point beyond feedforward activity and object decoding based on the EEG signal indicated later decoding onsets for objects embedded in more complex backgrounds. Third, Deep Convolutional Neural Network performance confirmed this interpretation; feed-forward and less deep networks showed a higher degree of impairment in recognition for objects in complex backgrounds compared to recurrent and deeper networks. Together, these results support the notion that recurrent computations drive figure-ground segmentation of objects in complex scenes..

show abstract

“…Besides, in the gender group, the males’ formats are mainly located on the low-frequency area, and subsequently, the texture on the males’ spectrogram repeats more irregularly compared to females’. Since the classic convolutional kernel utilized in these products is less effective in generalizing such irregular patterns due to shape mismatch 52 , the neural network-based feature extraction is restricted to further unearth the voice identity on these three models 53 , 54 . Moreover, similar situations can be observed in the rest research voice biometric models.…”

Section: Results Analysis and Discussionmentioning

confidence: 99%

Exploring racial and gender disparities in voice biometrics

Chen

Setlur

et al. 2022

Sci Rep

View full text Add to dashboard Cite

Systemic inequity in biometrics systems based on racial and gender disparities has received a lot of attention recently. These disparities have been explored in existing biometrics systems such as facial biometrics (identifying individuals based on facial attributes). However, such ethical issues remain largely unexplored in voice biometric systems that are very popular and extensively used globally. Using a corpus of non-speech voice records featuring a diverse group of 300 speakers by race (75 each from White, Black, Asian, and Latinx subgroups) and gender (150 each from female and male subgroups), we explore and reveal that racial subgroup has a similar voice characteristic and gender subgroup has a significant different voice characteristic. Moreover, non-negligible racial and gender disparities exist in speaker identification accuracy by analyzing the performance of one commercial product and five research products. The average accuracy for Latinxs can be 12% lower than Whites (p < 0.05, 95% CI 1.58%, 14.15%) and can be significantly higher for female speakers than males (3.67% higher, p < 0.05, 95% CI 1.23%, 11.57%). We further discover that racial disparities primarily result from the neural network-based feature extraction within the voice biometric product and gender disparities primarily due to both voice inherent characteristic difference and neural network-based feature extraction. Finally, we point out strategies (e.g., feature extraction optimization) to incorporate fairness and inclusive consideration in biometrics technology.

show abstract

Depth in convolutional neural networks solves scene segmentation

Cited by 22 publications

References 52 publications

The spatiotemporal neural dynamics of object location representations in the human brain

The spatiotemporal neural dynamics of object location representations in the human brain

On the Necessity of Recurrent Processing during Object Recognition: It Depends on the Need for Scene Segmentation

Exploring racial and gender disparities in voice biometrics

Contact Info

Product

Resources

About