SAYCam: A large, longitudinal audiovisual dataset recorded from the infant’s perspective

Sullivan, Jess; Mei, Michelle; Perfors, Amy; Wojcik, Erica H.; Frank, Michael C.

doi:10.31234/osf.io/fy8zx

Cited by 29 publications

(54 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Images depicting people, specifically the categories “man,” “woman,” and “child,” were not sampled according to census distributions (age, ethnicity, gender, etc.). Moreover, ecoset image and category distributions do not reflect the naturalistic, egocentric visual input typically encountered in the everyday life of infant and adults ( 46 , 47 ).…”

Section: Methodsmentioning

confidence: 92%

An ecologically motivated image dataset for deep learning yields better models of human vision

Mehrer

Spoerer

Jones

et al. 2021

Proc. Natl. Acad. Sci. U.S.A.

115

109

View full text Add to dashboard Cite

Deep neural networks provide the current best models of visual information processing in the primate brain. Drawing on work from computer vision, the most commonly used networks are pretrained on data from the ImageNet Large Scale Visual Recognition Challenge. This dataset comprises images from 1,000 categories, selected to provide a challenging testbed for automated visual object recognition systems. Moving beyond this common practice, we here introduce ecoset, a collection of >1.5 million images from 565 basic-level categories selected to better capture the distribution of objects relevant to humans. Ecoset categories were chosen to be both frequent in linguistic usage and concrete, thereby mirroring important physical objects in the world. We test the effects of training on this ecologically more valid dataset using multiple instances of two neural network architectures: AlexNet and vNet, a novel architecture designed to mimic the progressive increase in receptive field sizes along the human ventral stream. We show that training on ecoset leads to significant improvements in predicting representations in human higher-level visual cortex and perceptual judgments, surpassing the previous state of the art. Significant and highly consistent benefits are demonstrated for both architectures on two separate functional magnetic resonance imaging (fMRI) datasets and behavioral data, jointly covering responses to 1,292 visual stimuli from a wide variety of object categories. These results suggest that computational visual neuroscience may take better advantage of the deep learning framework by using image sets that reflect the human perceptual and cognitive experience. Ecoset and trained network models are openly available to the research community.

show abstract

Section: Methodsmentioning

confidence: 92%

An ecologically motivated image dataset for deep learning yields better models of human vision

Mehrer

Spoerer

Jones

et al. 2021

Proc. Natl. Acad. Sci. U.S.A.

115

109

View full text Add to dashboard Cite

show abstract

“…Moreover, ImageNet consists of statistically independent static frames, while infants receive a continuous stream of temporally correlated inputs ( 58 ). A better proxy of the real infant datastream is represented by the recently released SAYCam ( 59 , 60 ) dataset, which contains head-mounted video camera data from three children (about 2 h/wk spanning ages 6 to 32 mo) ( Fig. 3 B ).…”

Section: Deep Contrastive Learning On First-person Video Data From Chmentioning

confidence: 99%

“…These pathways were optimized to aggregate the resulting embeddings and their close neighbors (light brown points) and to separate the resulting embeddings and their farther neighbors (dark brown points). ( B ) Examples from the SAYCam dataset ( 59 ), which was collected by head-mounted cameras on infants for 2 h each week between ages 6 and 36 mo. ( C ) Neural predictivity for models trained on SAYCam and ImageNet.…”

Section: Deep Contrastive Learning On First-person Video Data From Chmentioning

confidence: 99%

Unsupervised neural network models of the ventral visual stream

Zhuang

Yan

Nayebi

et al. 2021

Proc. Natl. Acad. Sci. U.S.A.

Self Cite

236

197

View full text Add to dashboard Cite

Deep neural networks currently provide the best quantitative models of the response patterns of neurons throughout the primate ventral visual stream. However, such networks have remained implausible as a model of the development of the ventral stream, in part because they are trained with supervised methods requiring many more labels than are accessible to infants during development. Here, we report that recent rapid progress in unsupervised learning has largely closed this gap. We find that neural network models learned with deep unsupervised contrastive embedding methods achieve neural prediction accuracy in multiple ventral visual cortical areas that equals or exceeds that of models derived using today’s best supervised methods and that the mapping of these neural network models’ hidden layers is neuroanatomically consistent across the ventral stream. Strikingly, we find that these methods produce brain-like representations even when trained solely with real human child developmental data collected from head-mounted cameras, despite the fact that these datasets are noisy and limited. We also find that semisupervised deep contrastive embeddings can leverage small numbers of labeled examples to produce representations with substantially improved error-pattern consistency to human behavior. Taken together, these results illustrate a use of unsupervised learning to provide a quantitative model of a multiarea cortical brain system and present a strong candidate for a biologically plausible computational theory of primate sensory learning.

show abstract

“…As a baseline, we test the embeddings created by randomly initialized models, examining whether or not the inductive biases conveyed by the architecture are sufficient to embed objects in the same relation more similarly. We then then compare these results to models trained on the following datasets: SAYCam: this dataset offers longitudinal headcam video from a small number of babies (Sullivan et al, 2020). We use models trained on a single child's footage (child S), approximately two hours per week while the child was between 6-30 months old, a total of 221 hours.…”

Section: Colorsmentioning

confidence: 99%

“…Quinn (2003) reviews two findings in infant relation categorization: categorizing one object as above/below another precedes categorizing an object as between other objects, and categorizing relations over specific objects predates abstract relations over varying objects. We model these phenomena with deep neural networks, including contemporary architectures specialized for relational learning and vision models pretrained on baby headcam footage (Sullivan et al, 2020). Across two computational experiments, we can account for most of the developmental findings, suggesting these neural network models are useful for studying the computational mechanisms of infant categorization.…”

mentioning

confidence: 99%

Examining Infant Relation Categorization Through Deep Neural Networks

Davidson¹,

Lake²

2021

Preprint

View full text Add to dashboard Cite

Categorizing spatial relations is central to the development of visual understanding and spatial cognition, with roots in the first few months of life. Quinn (2003) reviews two findings in infant relation categorization: categorizing one object as above/below another precedes categorizing an object as between other objects, and categorizing relations over specific objects predates abstract relations over varying objects. We model these phenomena with deep neural networks, including contemporary architectures specialized for relational learning and vision models pretrained on baby headcam footage (Sullivan et al., 2020). Across two computational experiments, we can account for most of the developmental findings, suggesting these models are useful for studying the computational mechanisms of infant categorization.

show abstract

SAYCam: A large, longitudinal audiovisual dataset recorded from the infant’s perspective

Cited by 29 publications

References 0 publications

An ecologically motivated image dataset for deep learning yields better models of human vision

An ecologically motivated image dataset for deep learning yields better models of human vision

Unsupervised neural network models of the ventral visual stream

Examining Infant Relation Categorization Through Deep Neural Networks

Contact Info

Product

Resources

About