Some issues in resolution of diagnostic tests using an imperfect gold standard

Hawkins, Douglas M.; Garrett, James; Stephenson, Betty

doi:10.1002/sim.819

Cited by 65 publications

(36 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Because the PPP of the CSaR was 0.55, we had some support for moving half of the Dlv+/CS+ to Cell a: true positives. We grant, as argued by critics of discrepant resolution (summarized in Hawkins et al, 2001), that we were ignoring the cases of CSaR/DELV-NR disagreement in Cells a and d, but our knowledge of the CSaR's diagnostic accuracy did not give us confidence to have it override two opposite diagnoses.…”

Section: Hypothesis 2: Using Csar Ppp As a Resolvermentioning

confidence: 85%

“…Thus, a refinement of the CSaR's diagnostic accuracy could give an indication of how much more confidence in DELV-NR scores is warranted when they agreed with CSaR in general, including the discrepant cases in which the DELV-NR disagreed with the LS, a procedure called discrepant resolution (Hawkins et al, 2001). Therefore, as shown in Table 5, we compared the DELV-NR outcomes to those of CSaR.…”

Section: Hypothesis 2: Using Csar Ppp As a Resolvermentioning

confidence: 97%

“…First, specificity and NPP were adequate, so one can have a fair amount of confidence in its diagnoses of typical development and will also expect between one quarter and two-thirds of the CSaR diagnoses of LI to be correct (95% CI around PPP [0.27, 0.68]). Given the large number of diagnoses expected to be incorrect, we would not have enough confidence in a CSaR diagnosis to override a DELV-NR diagnosis when there is no indication that the DELV-NR is wrong (as many critics of discrepancy resolution require; Hawkins et al, 2001). However, when the diagnosis of the DELV-NR has been put in doubt by an opposite diagnosis from the LS, we suggest that one is justified in accepting a CSaR diagnosis as a corroboration of the DELV-NR in proportion to the values of its predictive power calculated relative to LSs, as is illustrated in Table 5.…”

Section: Rationale For Using Csar Ppp and Npp To Reevaluate Accuracy mentioning

confidence: 98%

“…For medical decision making, where the disease status of individuals is seldom certain and available reference standards are often imperfect, several strategies have been proposed to combine data from two reference standards, using a second imperfect test as a "resolver test" in ways that attempt to compensate for potential biases in each of them (Green, Black, & Johnson, 1998;Hawkins, Garrett, & Stephenson, 2001).…”

Section: Using Lss To Evaluate Diagnostic Accuracymentioning

confidence: 99%

See 3 more Smart Citations

Seeking a Valid Gold Standard for an Innovative, Dialect-Neutral Language Test

Pearson

Jackson

2014

J Speech Lang Hear Res

View full text Add to dashboard Cite

PURPOSE In this study, the authors explored alternative gold standards to validate an innovative, dialect-neutral language assessment. METHOD Participants were 78 African American children, ages 5;0 (years;months) to 6;11. Twenty participants had previously been identified as having language impairment. The Diagnostic Evaluation of Language Variation-Norm Referenced (DELV-NR; Seymour, Roeper, & J. de Villiers, 2005) was administered, and concurrent language samples (LSs) were collected. Using LS profiles as the gold standard, sensitivity, specificity, and other measures of diagnostic accuracy were compared for diagnoses made from the DELV-NR and participants' clinical status prior to recruitment. In a second analysis, the authors used results from the first analysis to make evidence-based adjustments in the estimates of DELV-NR diagnostic accuracy. RESULTS Accuracy of the DELV-NR relative to LS profiles was greater than that of prior diagnoses, indicating that the DELV-NR was an improvement over preexisting diagnoses for this group. Specificity met conventional standards, but sensitivity was somewhat low. Reanalysis using the positive and negative predictive power of the preexisting diagnosis in a discrepant-resolution procedure revealed that estimates for sensitivity and specificity for the DELV-NR were .85 and .93, respectively. CONCLUSION The authors found that, even after making allowances for the imperfection of available gold standards, clinical decisions made with the DELV-NR achieved high values on conventional measures of diagnostic accuracy.

show abstract

Section: Hypothesis 2: Using Csar Ppp As a Resolvermentioning

confidence: 85%

Section: Hypothesis 2: Using Csar Ppp As a Resolvermentioning

confidence: 97%

Section: Rationale For Using Csar Ppp and Npp To Reevaluate Accuracy mentioning

confidence: 98%

Section: Using Lss To Evaluate Diagnostic Accuracymentioning

confidence: 99%

See 2 more Smart Citations

Seeking a Valid Gold Standard for an Innovative, Dialect-Neutral Language Test

Pearson

Jackson

2014

J Speech Lang Hear Res

View full text Add to dashboard Cite

show abstract

“…An imperfect gold standard results in inaccurate estimates of sensitivity and specificity. Determining the magnitude of inaccuracy in the gold standard measure is not an easy task [3]. When no gold standard is really available, model-based approaches can be used like latent class analysis (LCA), but this requires tedious statistical analysis [4].…”

Section: Reference Standard Definitionmentioning

confidence: 99%

Screening test accuracy studies: how valid are our conclusions? Application to visual inspection methods for cervical screening

Mahé

Gaffikin²

2005

Cancer Causes Control

View full text Add to dashboard Cite

While the basic concepts associated with screening are simple, studying the value of new tests requires a very strict methodology. This paper summarizes lessons learned regarding appropriate methodologies to assess the value of new screening approaches using visual inspection with acetic acid (VIA), a screening test for cervical pre-cancerous lesions, as an example. In addition to being convenient to, safe for and acceptable by target community members, a screening test should be reliable and have good test characteristics (i.e. be able to discriminate well between early disease and non disease). Test reliability assesses the degree to which repeated measurements of the test yields the same result. To ensure reproducibility of study findings, test reliability should be assessed before any evaluation of test accuracy. The accuracy of a test (specificity and sensitivity) is measured using cross-sectional studies with adequate sample size. Several basic features are necessary to ensure internal validity for such studies: (a) final disease status data should be obtained for all subjects, (b) all tests results must be determined independently of previous results, (c) the reference standard used to determine the disease status should be accurate, (d) the full "spectrum" of the disease should be included in the study. The study results should also have external validity to be applicable to other populations to which the test will be applied. All these consideration are exemplified by 17 very heterogeneous studies published to date assessing VIA test accuracy. The assessment of a new screening test is the first step in researching a new cancer prevention strategy. For this reason, this step should be carefully addressed through rigorous studies.

show abstract

Medical Devices

Campbell¹,

Irony²,

Lao³

et al. 2005

Encyclopedia of Biostatistics

View full text Add to dashboard Cite

A medical device is any item that treats or diagnoses a health condition but whose action is primarily not chemical or biological. The main focus concerns the evaluation of clinical studies to establish the safety and effectiveness of different kinds of medical devices. The design and analysis of medical device studies pose unique statistical challenges. Both nondiagnostic and diagnostic devices are considered. Nondiagnostic devices include therapeutic ones and implants, among others. Statistical issues for these kinds of devices include the placebo effect, sham controls, inability to perform blinded studies, noninferiority, survival analysis, repeated measures, and historical controls. Diagnostic devices pose a very diverse set of challenges, with markedly different design and analysis considerations. Special attention is devoted here to microarrays in particular. Bayesian approaches to medical device studies are discussed.

show abstract

Some issues in resolution of diagnostic tests using an imperfect gold standard

Cited by 65 publications

References 19 publications

Seeking a Valid Gold Standard for an Innovative, Dialect-Neutral Language Test

Seeking a Valid Gold Standard for an Innovative, Dialect-Neutral Language Test

Screening test accuracy studies: how valid are our conclusions? Application to visual inspection methods for cervical screening

Medical Devices

Contact Info

Product

Resources

About