“…In this section, we train and test four neural network models on the same three data sets as before. These models have been proposed in speech technology research, in particular in lowresource setting where transcribed data may not be available, and showed high performance in word and phone discrimination tasks (Kamper, 2019;Kamper et al, 2015;Matusevych, Kamper, Schatz, Feldman, & Goldwater, 2021;Renshaw, Kamper, Jansen, & Goldwater, 2015). Figure 2 schematically shows the difference between the models' architectures and input data.…”