This black box study assessed the performance of forensic firearms examiners in the United States. It involved three different types of firearms and 173 volunteers who performed a total of 8640 comparisons of both bullets and cartridge cases. The overall false-positive error rate was estimated as 0.656% and 0.933% for bullets and cartridge cases, respectively, while the rate of false negatives was estimated as 2.87% and 1.87% for bullets and cartridge cases, respectively. The majority of errors were made by a limited number of examiners. Because chi-square tests of independence strongly suggest that error probabilities are not the same for each examiner, these are maximum-likelihood estimates based on the beta-binomial probability model and do not depend on an assumption of equal examiner-specific error rates. Corresponding 95% confidence intervals are (0.305%, 1.42%) and (0.548%, 1.57%) for false positives for bullets and cartridge cases, respectively, and (1.89%, 4.26%) and (1.16%, 2.99%) for false negatives for bullets and cartridge cases, respectively. The results of this study are consistent with prior studies, despite its comprehensive design and challenging specimens.
In a comprehensive study to assess various aspects of the performance of qualified forensic firearms examiners, volunteer examiners compared both bullets and cartridge cases fired from three different types of firearms. They rendered opinions on each comparison according to the Association of Firearm & Tool Mark Examiners (AFTE) Range of Conclusions, as Identification, Inconclusive (A, B, or C), Elimination, or Unsuitable. In this part of the study, comparison sets used previously to characterize the overall accuracy of examiners were blindly resubmitted to examiners to assess the repeatability (105 examiners; 5700 comparisons of bullets and cartridge cases) and reproducibility (191 examiners of bullets, 193 of cartridge cases; 5790 comparisons) of firearms examinations. Data gathered using the prevailing AFTE Range were also recategorized into two hypothetical scoring systems. Consistently positive differences between observed agreement and expected agreement indicate that the repeatability and reproducibility of examiners exceed chance agreement. When averaged over bullets and cartridge cases, the repeatability of comparison decisions (involving all five levels of the AFTE Range) was 78.3% for known matches and 64.5% for known nonmatches. Similarly averaged reproducibility was 67.3%% for known matches and 36.5% for known nonmatches. For both repeatability and reproducibility, many of the observed disagreements were between a definitive and inconclusive category. Examiner decisions are reliable and trustworthy in the sense that identifications are unlikely when examiners are comparing non‐matching items, and eliminations are unlikely when they are comparing matching items.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.