“…The features were extracted from the stimuli on a frame-by-frame basis (for more details on computational feature extraction see Brattico et al, 2011;Eerola et al, 2011;Alluri et al, 2012). The values were converted to z-scores and grouped into six sets according to a classification implemented in Eerola et al (2011) in order to minimize Type I errors resulting from multiple comparisons. The six sets included dynamic features (root mean square, RMS energy, and low energy), rhythm features (tempo and pulse clarity), timbre features (zero cross, centroid, brightness, spread, skewness, kurtosis, flatness, entropy, roughness, irregularity, and spectral flux), pitch features (chroma peak), tonality features (key clarity, mode, HCDF, and spectral entropy) and articulation features (attack time and attack slope).…”