“…It is difficult to make principled comparisons of test–retest reliability between studies, because the standard metrics (Pearson correlation, intraclass correlation coefficient) do not generalize between participant populations and because test–retest reliability can be affected by practice effects, which can differ between tests (e.g., Bird, Papadopoulou, Ricciardelli, Rossor, & Cipolotti, 2003). Nonetheless, it is worth noting the mistuning test–retest reliabilities previously reported for the PROMS ( r = .68; Law & Zentner, 2012), PROMS-Short ( r = .47; Zentner & Strauss, 2017), and Mini-PROMS ( r = .63; Zentner & Strauss, 2017). We are not aware of test–retest studies for the scale test of the PSYCHOACOUSTICS toolbox (Soranzo & Grassi, 2014), but the test–retest reliability for the pitch discrimination test of this toolbox is high ( r = .87; Smith, Bartholomew, Burnham, Tillmann, & Cirulli, 2017).…”