Sampling variability in forensic likelihood-ratio computation: A simulation study

Ali, Tauseef; Spreeuwers, Luuk; Veldhuis, Raymond; Meuwly, Didier

doi:10.1016/j.scijus.2015.05.003

Cited by 14 publications

(7 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, the results should in some ways be treated as the 'base case scenario' for variability in system performance as a function of sampling, and that even wider variability in system performance would be expected where poor quality recordings are used and independent samples are drawn from a much larger database. Meanwhile, the 100 times replication, similar to previous studies (Ali et al, 2015;Morrison & Poh, 2018), is an arbitrary choice in the current thesis. The major limitation in using EER for LR-based FVC system evaluation is that EER only treats LLRs categorially and it does not take the magnitude of evidence into consideration, i.e., what is considered an "error" is not being judged on a threshold of LLR = 0; therefore, a system that consistently yields high contrary-to-fact LLRs could have the same system validity to the one that produces low contrary-to-fact LLRs.…”

Section: Discussionmentioning

confidence: 99%

“…Under real case scenarios, 30 to 40 training and reference speakers are likely to be sampled from a relevant population (e.g., 35 speakers in Rose, 2013b), and the size of the relevant population itself is, in most cases, considerably larger than the number of training and reference speakers sampled. Although it is possible to sample more speakers from the relevant population, empirical studies (e.g., Ali et al, 2015;Morrison & Poh, 2018) and the current Chapter have shown that the effect of sampling variability on both overall performance and individual behaviour is inevitable. It is then a practical consideration for casework, i.e., would we obtain the same results for this particular pair of speakers if the experiment is replicated?…”

Section: Three-feature Systemsmentioning

confidence: 91%

“…Meanwhile, some other studies examined the speaker-discriminatory power using suprasegmental features, e.g., long-term F0 distribution (Kinoshita et al, 2009), lexical tones (Rose & Wang, 2016), speech tempo (Lennon et al, 2019) and voice quality . Apart from testing different linguistic-phonetic features, many other studies have investigated the effect of non-linguistic factors on LR-based FVC systems, e.g., sample size (Hughes, 2017;Ishihara & Kinoshita, 2008), statistical models (Kinoshita & Wagner, 2014;Morrison, 2011a), calibration methods (Morrison & Poh, 2018), sampling variability (Ali et al, 2015), channel mismatch , reference population mismatch (Watt et al, 2020). Ultimately, previous studies used speech data where the ground truth is known to investigate two major questions, i.e., whether the system does what it is designed to do (validity) and whether the system would yield the same result if the analysis were repeated (reliability).…”

Section: The Lr Approach In Fvcmentioning

confidence: 99%

See 2 more Smart Citations

The effect of sampling variability on overall performance and individual speakers’ behaviour in likelihood ratio-based forensic voice comparison

Wang

2022

IJSLL

View full text Add to dashboard Cite

In the past years, there is increasing awareness and acceptance among forensic speech scientists of using Bayesian reasoning and likelihood ratio (LR) framework for forensic voice comparison (FVC) and expressing expert conclusions. Numerous studies have explored overall performance using numerical LRs. Given that the data used for validation is a sample coming from an unknown distribution, little attention has been paid to the effect of sampling variability or individuals' behaviour. This thesis investigates these issues using linguistic-phonetic variables. First, it investigates how different configurations of training, test and reference speakers affect overall performance. The results show that variability in overall performance is mostly caused by varying the test speakers, while less variability is caused by sampling variability in the reference and training speakers. Second, this thesis explores the effect of sampling variability on overall performance and individuals' behaviour in relation to the use of linguistic-phonetic features. Results show that sampling variability affects overall performance to different extents using different features, while combining more features does not always improve overall performance. Sampling variability has limited effects on individuals in same-speaker comparisons, and most speakers are less affected by sampling variability in different-speaker comparisons when four or more features are used. Third, this thesis explores the effect of sampling variability on overall performance in relation to score distributions. Results reveal that system validity and reliability are more affected by differentspeaker score skewness, and less affected by same-speaker score skewness. Using different calibration methods reduces the effect of sampling variability to different extents. The resultsin this thesis have implications for both FVC using numerical LRs and FVC in general, as experts need to make pragmatic decisions whether numerical LR is used or not, and every decision made has implication to final evaluation results. Further, the results on score skewness and different calibration methods have potential contribution for improving FVC performance using automatic systems.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Three-feature Systemsmentioning

confidence: 91%

Section: The Lr Approach In Fvcmentioning

confidence: 99%

See 1 more Smart Citation

The effect of sampling variability on overall performance and individual speakers’ behaviour in likelihood ratio-based forensic voice comparison

Wang

2022

IJSLL

View full text Add to dashboard Cite

show abstract

“…logistic regression [10], pool adjacent violators [11], Bayesian model [12], scoring method [13]- [15]) have been developed and the performance has been compared. For example, [16] explored the effectiveness of three calibration methods (i.e. kernel density estimation, logistic regression, pool adjacent violators) in dealing with sampling variability with three sizes of the training scores.…”

Section: Calibration Methodsmentioning

confidence: 99%

System Performance as a Function of Calibration Methods, Sample Size and Sampling Variability in Likelihood Ratio-Based Forensic Voice Comparison

Wang¹,

Hughes²

2021

Interspeech 2021

View full text Add to dashboard Cite

In data-driven forensic voice comparison, sample size is an issue which can have substantial effects on system output. Numerous calibration methods have been developed and some have been proposed as solutions to sample size issues. In this paper, we test four calibration methods (i.e. logistic regression, regularised logistic regression, Bayesian model, ELUB) under different conditions of sampling variability and sample size. Training and test scores were simulated from skewed distributions derived from real experiments, increasing sample sizes from 20 to 100 speakers for both the training and test sets. For each sample size, the experiments were replicated 100 times to test the susceptibility of different calibration methods to sampling variability. The Cllr mean and range across replications were used for evaluation. The Bayesian model and regularized logistic regression produced the most stable Cllr values when the sample size is small (i.e. 20 speakers), although mean Cllr is consistently lowest using logistic regression. The ELUB calibration method generally is the least preferred as it is the most sensitive to sample size and sampling variability (mean = 0.66, range = 0.21-0.59).

show abstract

“…The most common approach of forensic science to this problem is to estimate a likelihood ratio (LR), i.e., the ratio of the joint probability of occurrence of the two traces under the hypothesis that they arose from the same source and under the hypothesis that they arose from different sources. A convenient solution is to replace the joint probability of the traces by the probability of a distance between the two traces quantifying their dissimilarity (6, 9–14,). If, as is most often the case, the distance is scalar, there is an important loss of information.…”

Section: Introductionmentioning

confidence: 99%

Evaluation of distance‐based approaches for forensic comparison: Application to hand odor evidence

Rivals

Sautier

Cognon

et al. 2021

Journal of Forensic Sciences

View full text Add to dashboard Cite

The issue of distinguishing between the same-source and different-source hypotheses based on various types of traces is a generic problem in forensic science. This problem is often tackled with Bayesian approaches, which are able to provide a likelihood ratio that quantifies the relative strengths of evidence supporting each of the two competing hypotheses. Here, we focus on distance-based approaches, whose robustness and specifically whose capacity to deal with high-dimensional evidence are very different, and need to be evaluated and optimized. A unified framework for direct methods based on estimating the likelihoods of the distance between traces under each of the two competing hypotheses, and indirect methods using logistic regression to discriminate between same-source and different-source distance distributions, is presented. Whilst direct methods are more flexible, indirect methods are more robust and quite natural in machine learning. Moreover, indirect methods also enable the use of a vectorial distance, thus preventing the severe information loss suffered by scalar distance approaches. Direct and indirect methods are compared in terms of sensitivity, specificity, and robustness, with and without dimensionality reduction, with and without feature selection, on the example of hand odor profiles, a novel and challenging type of evidence in the field of forensics. Empirical evaluations on a large panel of 534 subjects and their 1690 odor traces show the significant superiority of the indirect methods, especially without dimensionality reduction, be it with or without feature selection.

show abstract

Sampling variability in forensic likelihood-ratio computation: A simulation study

Cited by 14 publications

References 37 publications

The effect of sampling variability on overall performance and individual speakers’ behaviour in likelihood ratio-based forensic voice comparison

The effect of sampling variability on overall performance and individual speakers’ behaviour in likelihood ratio-based forensic voice comparison

System Performance as a Function of Calibration Methods, Sample Size and Sampling Variability in Likelihood Ratio-Based Forensic Voice Comparison

Evaluation of distance‐based approaches for forensic comparison: Application to hand odor evidence

Contact Info

Product

Resources

About