Calibration model selection is required for all quantitative methods in toxicology and more broadly in bioanalysis. This typically involves selecting the equation order (quadratic or linear) and weighting factor correctly modelizing the data. A mis-selection of the calibration model will generate lower quality control (QC) accuracy, with an error up to 154%. Unfortunately, simple tools to perform this selection and tests to validate the resulting model are lacking. We present a stepwise, analyst-independent scheme for selection and validation of calibration models. The success rate of this scheme is on average 40% higher than a traditional "fit and check the QCs accuracy" method of selecting the calibration model. Moreover, the process was completely automated through a script (available in Supplemental Data 3) running in RStudio (free, open-source software). The need for weighting was assessed through an F-test using the variances of the upper limit of quantification and lower limit of quantification replicate measurements. When weighting was required, the choice between 1/x and 1/x2 was determined by calculating which option generated the smallest spread of weighted normalized variances. Finally, model order was selected through a partial F-test. The chosen calibration model was validated through Cramer-von Mises or Kolmogorov-Smirnov normality testing of the standardized residuals. Performance of the different tests was assessed using 50 simulated data sets per possible calibration model (e.g., linear-no weight, quadratic-no weight, linear-1/x, etc.). This first of two papers describes the tests, procedures and outcomes of the developed procedure using real LC-MS-MS results for the quantification of cocaine and naltrexone.
In the first part of this paper (I-Description and application), an automated, stepwise and analyst-independent process for the selection and validation of calibration models was put forward and applied to two model analytes. This second part presents the mathematical reasoning and experimental work underlying the selection of the different components of this procedure. Different replicate analysis designs (intra/inter-day and intra/inter-extraction) were tested and their impact on test results was evaluated. For most methods, the use of intra-day/intra-extraction measurement replicates is recommended due to its decreased variability. This process should be repeated three times during the validation process in order to assess the time stability of the underlying model. Strategies for identification of heteroscedasticity and their potential weaknesses were examined and a unilateral F-test using the lower limit of quantification and upper limit of quantification replicates was chosen. Three different options for model selection were examined and tested: ANOVA lack-of-fit (LOF), partial F-test and significance of the second-order term. Examination of mathematical assumptions for each test and LC-MS-MS experimental results lead to selection of the partial F-test as being the most suitable. The advantages and drawbacks of ANOVA-LOF, examination of the standardized residuals graph and residuals normality testing (Kolmogorov-Smirnov or Cramer-Von Mises) for validation of the calibration model were examined with the last option proving the best in light of its robustness and accuracy. Choosing the correct calibration model improves QC accuracy, and simulations have shown that this automated scheme has a much better performance than a more traditional method of fitting with increasingly complex models until QC accuracies pass below a threshold.
Calculating the confidence interval is a common procedure in data analysis and is readily obtained from normally distributed populations with the familiar [Formula: see text] formula. However, when working with non-normally distributed data, determining the confidence interval is not as obvious. For this type of data, there are fewer references in the literature, and they are much less accessible. We describe, in simple language, the percentile and bias-corrected and accelerated variations of the bootstrap method to calculate confidence intervals. This method can be applied to a wide variety of parameters (mean, median, slope of a calibration curve, etc.) and is appropriate for normal and non-normal data sets. As a worked example, the confidence interval around the median concentration of cocaine in femoral blood is calculated using bootstrap techniques. The median of the non-toxic concentrations was 46.7 ng/mL with a 95% confidence interval of 23.9-85.8 ng/mL in the non-normally distributed set of 45 postmortem cases. This method should be used to lead to more statistically sound and accurate confidence intervals for non-normally distributed populations, such as reference values of therapeutic and toxic drug concentration, as well as situations of truncated concentration values near the limit of quantification or cutoff of a method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.