Abstract:As a subject's true disease status is seldom known with certainty, it is necessary to compare the performance of new diagnostic tests with those of a currently accepted but imperfect 'gold standard'. Errors made by the gold standard mean that the sensitivity and specificity calculated for the new test are biased, and do not correctly estimate the new method's sensitivity and specificity. The traditional approach to this problem was 'discrepant resolution', in which the subjects for whom the two methods disagre… Show more
“…Because the PPP of the CSaR was 0.55, we had some support for moving half of the Dlv+/CS+ to Cell a: true positives. We grant, as argued by critics of discrepant resolution (summarized in Hawkins et al, 2001), that we were ignoring the cases of CSaR/DELV-NR disagreement in Cells a and d, but our knowledge of the CSaR's diagnostic accuracy did not give us confidence to have it override two opposite diagnoses.…”
Section: Hypothesis 2: Using Csar Ppp As a Resolvermentioning
confidence: 85%
“…Thus, a refinement of the CSaR's diagnostic accuracy could give an indication of how much more confidence in DELV-NR scores is warranted when they agreed with CSaR in general, including the discrepant cases in which the DELV-NR disagreed with the LS, a procedure called discrepant resolution (Hawkins et al, 2001). Therefore, as shown in Table 5, we compared the DELV-NR outcomes to those of CSaR.…”
Section: Hypothesis 2: Using Csar Ppp As a Resolvermentioning
confidence: 97%
“…First, specificity and NPP were adequate, so one can have a fair amount of confidence in its diagnoses of typical development and will also expect between one quarter and two-thirds of the CSaR diagnoses of LI to be correct (95% CI around PPP [0.27, 0.68]). Given the large number of diagnoses expected to be incorrect, we would not have enough confidence in a CSaR diagnosis to override a DELV-NR diagnosis when there is no indication that the DELV-NR is wrong (as many critics of discrepancy resolution require; Hawkins et al, 2001). However, when the diagnosis of the DELV-NR has been put in doubt by an opposite diagnosis from the LS, we suggest that one is justified in accepting a CSaR diagnosis as a corroboration of the DELV-NR in proportion to the values of its predictive power calculated relative to LSs, as is illustrated in Table 5.…”
Section: Rationale For Using Csar Ppp and Npp To Reevaluate Accuracy mentioning
confidence: 98%
“…For medical decision making, where the disease status of individuals is seldom certain and available reference standards are often imperfect, several strategies have been proposed to combine data from two reference standards, using a second imperfect test as a "resolver test" in ways that attempt to compensate for potential biases in each of them (Green, Black, & Johnson, 1998;Hawkins, Garrett, & Stephenson, 2001).…”
Section: Using Lss To Evaluate Diagnostic Accuracymentioning
PURPOSE In this study, the authors explored alternative gold standards to validate an innovative, dialect-neutral language assessment. METHOD Participants were 78 African American children, ages 5;0 (years;months) to 6;11. Twenty participants had previously been identified as having language impairment. The Diagnostic Evaluation of Language Variation-Norm Referenced (DELV-NR; Seymour, Roeper, & J. de Villiers, 2005) was administered, and concurrent language samples (LSs) were collected. Using LS profiles as the gold standard, sensitivity, specificity, and other measures of diagnostic accuracy were compared for diagnoses made from the DELV-NR and participants' clinical status prior to recruitment. In a second analysis, the authors used results from the first analysis to make evidence-based adjustments in the estimates of DELV-NR diagnostic accuracy. RESULTS Accuracy of the DELV-NR relative to LS profiles was greater than that of prior diagnoses, indicating that the DELV-NR was an improvement over preexisting diagnoses for this group. Specificity met conventional standards, but sensitivity was somewhat low. Reanalysis using the positive and negative predictive power of the preexisting diagnosis in a discrepant-resolution procedure revealed that estimates for sensitivity and specificity for the DELV-NR were .85 and .93, respectively. CONCLUSION The authors found that, even after making allowances for the imperfection of available gold standards, clinical decisions made with the DELV-NR achieved high values on conventional measures of diagnostic accuracy.
“…Because the PPP of the CSaR was 0.55, we had some support for moving half of the Dlv+/CS+ to Cell a: true positives. We grant, as argued by critics of discrepant resolution (summarized in Hawkins et al, 2001), that we were ignoring the cases of CSaR/DELV-NR disagreement in Cells a and d, but our knowledge of the CSaR's diagnostic accuracy did not give us confidence to have it override two opposite diagnoses.…”
Section: Hypothesis 2: Using Csar Ppp As a Resolvermentioning
confidence: 85%
“…Thus, a refinement of the CSaR's diagnostic accuracy could give an indication of how much more confidence in DELV-NR scores is warranted when they agreed with CSaR in general, including the discrepant cases in which the DELV-NR disagreed with the LS, a procedure called discrepant resolution (Hawkins et al, 2001). Therefore, as shown in Table 5, we compared the DELV-NR outcomes to those of CSaR.…”
Section: Hypothesis 2: Using Csar Ppp As a Resolvermentioning
confidence: 97%
“…First, specificity and NPP were adequate, so one can have a fair amount of confidence in its diagnoses of typical development and will also expect between one quarter and two-thirds of the CSaR diagnoses of LI to be correct (95% CI around PPP [0.27, 0.68]). Given the large number of diagnoses expected to be incorrect, we would not have enough confidence in a CSaR diagnosis to override a DELV-NR diagnosis when there is no indication that the DELV-NR is wrong (as many critics of discrepancy resolution require; Hawkins et al, 2001). However, when the diagnosis of the DELV-NR has been put in doubt by an opposite diagnosis from the LS, we suggest that one is justified in accepting a CSaR diagnosis as a corroboration of the DELV-NR in proportion to the values of its predictive power calculated relative to LSs, as is illustrated in Table 5.…”
Section: Rationale For Using Csar Ppp and Npp To Reevaluate Accuracy mentioning
confidence: 98%
“…For medical decision making, where the disease status of individuals is seldom certain and available reference standards are often imperfect, several strategies have been proposed to combine data from two reference standards, using a second imperfect test as a "resolver test" in ways that attempt to compensate for potential biases in each of them (Green, Black, & Johnson, 1998;Hawkins, Garrett, & Stephenson, 2001).…”
Section: Using Lss To Evaluate Diagnostic Accuracymentioning
PURPOSE In this study, the authors explored alternative gold standards to validate an innovative, dialect-neutral language assessment. METHOD Participants were 78 African American children, ages 5;0 (years;months) to 6;11. Twenty participants had previously been identified as having language impairment. The Diagnostic Evaluation of Language Variation-Norm Referenced (DELV-NR; Seymour, Roeper, & J. de Villiers, 2005) was administered, and concurrent language samples (LSs) were collected. Using LS profiles as the gold standard, sensitivity, specificity, and other measures of diagnostic accuracy were compared for diagnoses made from the DELV-NR and participants' clinical status prior to recruitment. In a second analysis, the authors used results from the first analysis to make evidence-based adjustments in the estimates of DELV-NR diagnostic accuracy. RESULTS Accuracy of the DELV-NR relative to LS profiles was greater than that of prior diagnoses, indicating that the DELV-NR was an improvement over preexisting diagnoses for this group. Specificity met conventional standards, but sensitivity was somewhat low. Reanalysis using the positive and negative predictive power of the preexisting diagnosis in a discrepant-resolution procedure revealed that estimates for sensitivity and specificity for the DELV-NR were .85 and .93, respectively. CONCLUSION The authors found that, even after making allowances for the imperfection of available gold standards, clinical decisions made with the DELV-NR achieved high values on conventional measures of diagnostic accuracy.
“…An imperfect gold standard results in inaccurate estimates of sensitivity and specificity. Determining the magnitude of inaccuracy in the gold standard measure is not an easy task [3]. When no gold standard is really available, model-based approaches can be used like latent class analysis (LCA), but this requires tedious statistical analysis [4].…”
While the basic concepts associated with screening are simple, studying the value of new tests requires a very strict methodology. This paper summarizes lessons learned regarding appropriate methodologies to assess the value of new screening approaches using visual inspection with acetic acid (VIA), a screening test for cervical pre-cancerous lesions, as an example. In addition to being convenient to, safe for and acceptable by target community members, a screening test should be reliable and have good test characteristics (i.e. be able to discriminate well between early disease and non disease). Test reliability assesses the degree to which repeated measurements of the test yields the same result. To ensure reproducibility of study findings, test reliability should be assessed before any evaluation of test accuracy. The accuracy of a test (specificity and sensitivity) is measured using cross-sectional studies with adequate sample size. Several basic features are necessary to ensure internal validity for such studies: (a) final disease status data should be obtained for all subjects, (b) all tests results must be determined independently of previous results, (c) the reference standard used to determine the disease status should be accurate, (d) the full "spectrum" of the disease should be included in the study. The study results should also have external validity to be applicable to other populations to which the test will be applied. All these consideration are exemplified by 17 very heterogeneous studies published to date assessing VIA test accuracy. The assessment of a new screening test is the first step in researching a new cancer prevention strategy. For this reason, this step should be carefully addressed through rigorous studies.
A medical device is any item that treats or diagnoses a health condition but whose action is primarily not chemical or biological. The main focus concerns the evaluation of clinical studies to establish the safety and effectiveness of different kinds of medical devices. The design and analysis of medical device studies pose unique statistical challenges. Both nondiagnostic and diagnostic devices are considered. Nondiagnostic devices include therapeutic ones and implants, among others. Statistical issues for these kinds of devices include the placebo effect, sham controls, inability to perform blinded studies, noninferiority, survival analysis, repeated measures, and historical controls. Diagnostic devices pose a very diverse set of challenges, with markedly different design and analysis considerations. Special attention is devoted here to microarrays in particular. Bayesian approaches to medical device studies are discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.