“…In a complex visual scene, it seems impossible to perceive everything precisely at a glance because of the limited capacity of our visual system (Cohen et al, 2016; Luck & Vogel, 1997; McClelland & Bayne, 2016). However, we are capable of rapidly accessing the summary statistics, such as the mean and the variance, of the stimulus set in the scene along a variety of dimensions including not only the low-level and middle-level (e.g., orientation, hue, size; see Chetverikov et al, 2017; Dakin & Watt, 1997; Dan, 2001; Hansmann-Roth et al, 2019, 2021; Joo et al, 2009) but also the high-level visual features (e.g., emotional expression, facial identity; see Haberman et al, 2009; Haberman & Whitney, 2007, 2011; Han et al, 2021; Peng et al, 2020). This ability to extract the summary statistics from groups of similar objects is referred to as ensemble perception or summary representation (Alvarez, 2011; Hubert-Wallander & Boynton, 2015; Whitney & Yamanashi Leib, 2018).…”