[ research report ] D espite the high cost associated with the fluctuating clinical course of low back pain (LBP), no treatment strategy surgical or conservative has been shown to be consistently effective in reducing the often persistent symptoms, functional limitations, and disability associated with this condition. The lack of beneficial effects of conservative treatments for LBP may be due to the lack of a pathoanatomy-specific diagnosis, 27,30,31 as fewer than 20% of individuals with LBP can be given a specific, structurally based diagnosis.
22In the absence of a specific pathoanatomical diagnosis and to better direct treatment, a number of research and clinical groups have suggested that there is a need for a system that classifies individuals with LBP based on key clinical symptoms and multidimensional features of the LBP presentation. [1][2][3]8,9,14,29,35 The basis for this suggestion is that people with LBP represent a heterogeneous group, consisting of several smaller homogeneous subgroups. Logically, if the subgroups of patients were classified based on criteria relevant to their specific symptoms, these more homogeneous subgroups would have a higher likelihood of responding to matched treatment approaches. Such a classification system could be useful both in prognosis and treatment, rendering the development and testing of classification systems of LBP a top priority. 3,8,35 Numerous classification systems have been described for patients with LBP.34 Delitto and colleagues 11 described a treatment-based classification (TBC) system and used information gathered from the patient history and physical examination to place a patient into 1 of 4 classification categories that directed patient treatment: manipulation, specific exercise, stabilization, and traction. The T T STUDY DESIGN: Observational, cross-sectional reliability study.
T T OBJECTIVES:To examine the interrater reliability of novice raters in their use of the treatmentbased classification (TBC) system for low back pain and to explore the patterns of disagreement in classification errors.
T T BACKGROUND:Although the interrater reliability of individual test items in the TBC system is moderate to good, some error persists in classification decision making. Understanding which classification errors are common could direct further refinement of the TBC system.
T T METHODS:Using previously recorded patient data (n = 24), 12 novice raters classified patients according to the TBC schema. These classification results were combined with those of 7 other raters, allowing examination of the overall agreement using the kappa statistic, as well as agreement/ disagreement among pairwise comparisons in classification assignments. A chi-square test examined differences in percent agreement between the novice and more experienced raters and differences in classification distributions between these 2 groups of raters.
T T RESULTS:Among 12 novice raters, there was 80.9% agreement in the pairs of classification (κ = 0.62; 95% confidence interval: 0.59, 0.65) an...