Background
Joint inflammation is the common feature underlying juvenile idiopathic arthritis (JIA). Clinicians recognize patterns of joint involvement currently not part of the International League of Associations for Rheumatology (ILAR) classification. Using unsupervised machine learning, we sought to uncover data-driven joint patterns that predict clinical phenotype and disease trajectories.
Methods and findings
We analyzed prospectively collected clinical data, including joint involvement using a standard 71-joint homunculus, for 640 discovery patients with newly diagnosed JIA enrolled in a Canada-wide study who were followed serially for five years, treatment-naïve except for nonsteroidal anti-inflammatory drugs (NSAIDs) and diagnosed within one year of symptom onset. Twenty-one patients had systemic arthritis, 300 oligoarthritis, 125 rheumatoid factor (RF)-negative polyarthritis, 16 RF-positive polyarthritis, 37 psoriatic arthritis, 78 enthesitis-related arthritis (ERA), and 63 undifferentiated arthritis. At diagnosis, we observed global hierarchical groups of co-involved joints.
To characterize these patterns, we developed sparse multilayer non-negative matrix factorization (NMF). Model selection by internal bi-cross-validation identified seven joint patterns at presentation, to which all 640 discovery patients were assigned: pelvic girdle (57 patients), fingers (25), wrists (114), toes (48), ankles (106), knees (283), and indistinct (7). Patterns were distinct from clinical subtypes (
P
< 0.001 by χ
2
test) and reproducible through external data set validation on a 119-patient, prospectively collected independent validation cohort (reconstruction accuracy
Q
2
= 0.55 for patterns; 0.35 for groups).
Some patients matched multiple patterns. To determine whether their disease outcomes differed, we further subdivided the 640 discovery patients into three subgroups by degree of localization—the percentage of their active joints aligning with their assigned pattern: localized (≥90%; 359 patients), partially localized (60%–90%; 124), or extended (<60%; 157). Localized patients more often maintained their baseline patterns (
P
< 0.05 for five groups by permutation test) than nonlocalized patients (
P
< 0.05 for three groups by permutation test) over a five-year follow-up period.
We modelled time to zero joints in the discovery cohort using a multivariate Cox proportional hazards model considering joint pattern, degree of localization, and ILAR subtype. Despite receiving more intense treatment, 50% of nonlocalized patients had zero joints at one year compared to six months for localized patients. Overall, localized patients required less time to reach zero joints (partial:
P
= 0.0018 versus localized by log-rank test; extended:
P
...