“…The work on predictive models of neural responses to visual inputs has a long history that includes simple linearnonlinear (LN) models (Heeger, 1992a,b;Jones & Palmer, 1987), energy models (Adelson & Bergen, 1985), more general subunit/LN-LN models (Rust et al, 2005;Schwartz et al, 2006;Touryan et al, 2005;Vintch et al, 2015), and multi-layer neural network models (Lau et al, 2002;Lehky et al, 1992;Prenger et al, 2004;Zipser & Andersen, 1988). The deep learning revolution set new standards in prediction performance by leveraging task-optimized deep convolutional neural networks (CNNs) (Cadena et al, 2019;Cadieu et al, 2014;Yamins et al, 2014) and CNN-based architectures incorporating a shared encoding learned end-toend for thousands of neurons (Antolík et al, 2016;Bashiri et al, 2021;Batty et al, 2016;Burg et al, 2021;Cadena et al, 2019;Cowley & Pillow, 2020;Ecker et al, 2018;Franke et al, 2021;Kindel et al, 2017;Klindt et al, 2017;Lurz et al, 2020;McIntosh et al, 2016;Sinz et al, 2018;Walker et al, 2019;Zhang et al, 2018).…”