Item response theory (IRT) has a number of potential advantages over classical test theory in assessing self-reported health outcomes. IRT models yield invariant item and latent trait estimates (within a linear transformation), standard errors conditional on trait level, and trait estimates anchored to item content. IRT also facilitates evaluation of differential item functioning, inclusion of items with different response formats in the same scale, and assessment of person fit and is ideally suited for implementing computer adaptive testing. Finally, IRT methods can be helpful in developing better health outcome measures and in assessing change over time. These issues are reviewed, along with a discussion of some of the methodological and practical challenges in applying IRT methods.
OBJECTIVES The National Institutes of Health (NIH) Patient-Reported Outcomes Measurement Information System (PROMIS®) is a standardized set of patient-reported outcomes (PROs) that cover physical, mental, and social health. The aim of this study was to develop the NIH PROMIS gastrointestinal (GI) symptom measures. METHODS We first conducted a systematic literature review to develop a broad conceptual model of GI symptoms. We complemented the review with 12 focus groups including 102 GI patients. We developed PROMIS items based on the literature and input from the focus groups followed by cognitive debriefing in 28 patients. We administered the items to diverse GI patients (irritable bowel syndrome (IBS), inflammatory bowel disease (IBD), systemic sclerosis (SSc), and other common GI disorders) and a census-based US general population (GP) control sample. We created scales based on confirmatory factor analyses and item response theory modeling, and evaluated the scales for reliability and validity. RESULTS A total of 102 items were developed and administered to 865 patients with GI conditions and 1,177 GP participants. Factor analyses provided support for eight scales: gastroesophageal reflux (13 items), disrupted swallowing (7 items), diarrhea (5 items), bowel incontinence/soilage (4 items), nausea and vomiting (4 items), constipation (9 items), belly pain (6 items), and gas/bloat/flatulence (12 items). The scales correlated significantly with both generic and disease-targeted legacy instruments, and demonstrate evidence of reliability. CONCLUSIONS Using the NIH PROMIS framework, we developed eight GI symptom scales that can now be used for clinical care and research across the full range of GI disorders.
The graded response model can be used to describe test-taking behavior when item responses are classified into ordered categories. In this study, parameter recovery in the graded response model was investigated using the MULTILOG computer program under default conditions. Based on items having five response categories, 36 simulated data sets were generated that varied on true 0 distribution, true item discrimination distribution, and calibration sample size. The findings suggest, first, the correlations between the true and estimated parameters were consistently greater than 0.85 with sample sizes of at least 500. Second, the root mean square error differences between true and estimated parameters were comparable with results from binary data parameter recovery studies. Of special note was the finding that the calibration sample size had little influence on the recovery of the true ability parameter but did influence item-parameter recovery. Therefore, it appeared that item-parameter estimation error, due to small calibration samples, did not result in poor personparameter estimation. It was concluded that at least 500 examinees are needed to achieve an adequate calibration under the graded model.Researchers in item response theory (IRT) have concentrated on the implementation of binary statistical models for achievement and ability measurement. There has been minimal interest in exploring the potential of polychotomous models. The graded model (Samejima, 1969), in contrast to the well established one-, two-, and three-parameter (Lord, 1980) binary models, is appropriate when item responses can be ordered into more than two categories along an agreedisagree or high-low trait continuum.Recently, a parameterization program named MULTILOG (Thissen, 1986) has been developed for implementing the logistic graded model. Researchers wishing to build IRT-based test banks consisting of graded items are concerned about how well item parameters can be recovered. Hence, in this research, simulated data sets with known properties were generated, submitted to MULT1LOG, and the observed item parameters compared to the generating parameters. The Graded Response ModelAlthough Samejima (1969) developed the graded model to analyze cognitive processes, thus far it appears that the model is of primary interest in attitude (Koch, 1983) and personality measurement (Reise, 1989). Yet, the graded model 133
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.