Humans learn object categories without millions of labels, but to date the models with the highest correspondence to primate visual systems are all categorysupervised. This paper introduces a new self-supervised learning framework: instance-prototype contrastive learning (IPCL), and compares the internal representations learned by this model and other instance-level contrastive learning systems to the structure of human brain responses. We present the first evidence to date showing that self-supervised systems can show more brain-like representation than category-supervised models. Further, we find that recent substantial gains in top-1 accuracy from instance-wise contrastive learning models do not result in more brain-like representation-instead we find the architecture and normalization scheme are critical. Finally, this dataset reveals substantial representational structure in intermediate and late stages of the human visual system that is not accounted for by any model, whether self-supervised or category-supervised. Considering both neuroscience and machine vision perspectives, these results provide promise for instance-level representation as a key objective of visual system encoding, and highlight the room to grow towards more robust, efficient, human-like object representation. ⇤ Preprint. Under review. Stefania Bracci, J Brendan Ritchie, Ioannis Kalfas, and Hans P Op de Beeck. The ventral visual pathway represents animal appearance over animacy, unlike human behavior and deep neural networks. simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709, 2020a. Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020b. Radoslaw Martin Cichy, Dimitrios Pantazis, and Aude Oliva. Similarity-based fusion of meg and fmri reveals spatio-temporal dynamics in human cortex during visual object recognition. Cerebral Cortex, 26(8):3563-3579, 2016. CL Colby, ME Goldberg, et al. The updating of the representation of visual space in parietal cortex by intended eye movements. Science, 255(5040):90-92, 1992. 8 Trinity B Crapse and Marc A Sommer. Corollary discharge across the animal kingdom. Nature Reviews Neuroscience, 9(8):587-600, 2008. Hans P Op de Beeck, Ineke Pillet, and J Brendan Ritchie. Factors determining where categoryselective areas emerge in visual cortex. Trends in cognitive sciences, 2019. Carl Doersch and Andrew Zisserman. Multi-task self-supervised visual learning. In . Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231, 2018. Anthony G Greenwald, Brian A Nosek, and Mahzarin R Banaji. Understanding and using the implicit association test: I. an improved scoring algorithm. Journal of personality and social psychology, 85(2):197, 2003. Kalanit Grill-Spector and Kevin S Weiner. The functional architecture of the ventral temporal cortex and its role in categorization. Nature Reviews Neuroscienc...