BACKGROUND: We recently developed a model of stratified exercise therapy, consisting of (i) a stratification algorithm allocating patients with knee osteoarthritis (OA) into one of the three subgroups ('high muscle strength subgroup' representing a post-traumatic phenotype, 'low muscle strength subgroup' representing an age-induced phenotype, and 'obesity subgroup' representing a metabolic phenotype) and (ii) subgroup-specific exercise therapy. In the present study, we aimed to test the construct validity of this algorithm.METHODS: Data from five studies (four exercise therapy trial cohorts and one cross-sectional cohort) were used to test the construct validity of our algorithm by 63 a priori formulated hypotheses regarding three research questions: (i) are the proportions of patients in each subgroup similar across cohorts? (15 hypotheses); (ii) are the characteristics of each of the subgroups in line with their proposed underlying phenotypes? (30 hypotheses); (iii) are the effects of usual exercise therapy in the 3 subgroups in line with the proposed effect sizes? (18 hypotheses).RESULTS: Baseline data from a total of 1211 patients with knee OA were analyzed for the first and second research question, and follow-up data from 584 patients who were part of an exercise therapy arm within a trial for the third research question. In total, the vast majority (73%) of the hypotheses were confirmed. Regarding our first research question, we found similar proportions in each of the three subgroups across cohorts, especially for three cohorts. Regarding our second research question, subgroup characteristics were almost completely in line with the proposed underlying phenotypes. Regarding our third research question, usual exercise therapy resulted in similar, medium to large effect sizes for knee pain and physical function for all three subgroups.CONCLUSION: We found mixed results regarding the construct validity of our stratification algorithm. On the one hand, it is a valid instrument to consistently allocate patients into subgroups that aligned our hypotheses. On the other hand, in contrast to our hypotheses, subgroups did not differ substantially in effects of usual exercise therapy. An ongoing trial will assess whether this algorithm accompanied by subgroup-specific exercise therapy improves clinical and economic outcomes.
MULTIFILE
DOCUMENT
CC-BY-NC-NDSTUDY DESIGN:prospective cohort study.OBJECTIVE:To analyze responsiveness and minimal clinically important change (MCIC) of the US National Institutes of Health (NIH) minimal dataset for chronic low back pain (CLBP).SUMMARY OF BACKGROUND DATA:The NIH minimal dataset is a 40-item questionnaire developed to increase use of standardized definitions and measures for CLBP. Longitudinal validity of the total minimal dataset and the subscale Impact Stratification are unknown.METHODS:Total outcome scores on the NIH minimal dataset, Dutch Language Version, were calculated ranging from 0-100 points with higher scores representing worse functioning. Responsiveness and MCIC were determined with an anchor based method, calculating the area under the receiver operating characteristics (ROC) curve (AUC) and by determining the optimal cut-off point. Smallest detectable change (SDC) was calculated as a parameter of measurement error.RESULTS:In total 223 patients with CLBP were included. Mean total score on the NIH minimal dataset was 44 ± 14 points at baseline. The total outcome score was responsive to change with an AUC of 0.84. MCIC was 14 points with a sensitivity of 72% and specificity 82%, and SDC was 23 points. Mean total score on Impact Stratification (scale 8-50) was 34.4 ± 7.4 points at baseline, with an AUC of 0.91, an MCIC of 7.5 with a sensitivity 96% of and specificity of 78%, and an SDC of 14 points.CONCLUSION:The longitudinal validity of the NIH minimal dataset is adequate. An improvement of 14 points in total outcome score and 7.5 points in Impact Stratification can be interpreted as clinically important in individual patients. However, MCIC depends on baseline values and the method that is chosen to determine the optimal cut-off point. Furthermore, measurement error is larger than the MCIC. This means that individual change scores should be interpreted with caution.LEVEL OF EVIDENCE:4This is an open access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal
MULTIFILE