“…Meanwhile, some other studies examined the speaker-discriminatory power using suprasegmental features, e.g., long-term F0 distribution (Kinoshita et al, 2009), lexical tones (Rose & Wang, 2016), speech tempo (Lennon et al, 2019) and voice quality . Apart from testing different linguistic-phonetic features, many other studies have investigated the effect of non-linguistic factors on LR-based FVC systems, e.g., sample size (Hughes, 2017;Ishihara & Kinoshita, 2008), statistical models (Kinoshita & Wagner, 2014;Morrison, 2011a), calibration methods (Morrison & Poh, 2018), sampling variability (Ali et al, 2015), channel mismatch , reference population mismatch (Watt et al, 2020). Ultimately, previous studies used speech data where the ground truth is known to investigate two major questions, i.e., whether the system does what it is designed to do (validity) and whether the system would yield the same result if the analysis were repeated (reliability).…”