Abstract
Objective. To perform a comparative effectiveness feasibility study in juvenile localized scleroderma (LS), using standardized treatment regimens (consensus treatment plans; CTP).
Methods. A prospective, multicenter 1-year pilot observational cohort study was performed by Childhood Arthritis and Rheumatology Research Alliance (CARRA) LS workgroup members. Patients with active, moderate to severe juvenile LS were treated with one of 3 CTP: methotrexate alone, or in combination with intravenous (30 mg/kg/dose for 3 mos) or oral corticosteroids (2 mg/kg/day tapered by 48 weeks).
Results. Fifty patients, with demographics typical for juvenile LS, were enrolled, and 44 (88%) completed the study. Most had extracutaneous involvement. Patients improved in all 3 CTP, with > 75% having a major or moderate level of improvement compared to baseline. Damage accrued in some patients. Major deviations from prescribed regimen resulted from medication intolerance (n = 6; 14%) or treatment failure (n = 11; 25%); failures occurred in all 3 CTP. Significant responses to treatment were demonstrated by LS skin scoring measures and overall physician assessments, with differences in response level identified in some patient subsets. Response differences were associated with baseline disease activity level, LS subtype, skin disease extent, and extracutaneous involvement.
Conclusion. This study demonstrates the feasibility of conducting juvenile LS comparative effectiveness studies. The CTP were found to be safe, effective, and tolerable. Our assessments performed well. Because damage is common and may progress despite effective control of activity, we recommend initial treatment efficacy be evaluated primarily by activity measures. Potential confounders for response were identified that warrant further study.
Localized scleroderma (LS), which includes circumscribed morphea and linear scleroderma, is an autoimmune disease characterized by inflammation and fibrosis1,2. It is the most common childhood form of scleroderma, and pediatric onset has worse morbidity than adult disease3,4. Morbidity includes uveitis, seizures, arthropathy, and growth disturbances such as limb and facial hemiatrophy. Functional impairment is found in 27–56% of juvenile LS patients5,6,7,8,9.
Treatment focuses on controlling inflammation, because effective treatment options for fibrosis are limited. Most North American and European pediatric rheumatologists, and many other physicians, agree on the use of methotrexate (MTX) to treat moderate to severe disease10,11,12,13,14. A single randomized, double-blind, placebo-controlled study evaluating the efficacy of MTX in 70 patients with juvenile LS who received an initial oral corticosteroid (CS) course demonstrated a higher response rate among patients in the MTX compared to the placebo arm (67.4% vs 29.2%, respectively)15. Many case reports also support MTX’s efficacy16–22. However, physicians differ regarding MTX dosing, route, and duration, the need for CS, and the CS regimen10,23. In a survey of North American pediatric rheumatologists asking their preference for treating juvenile LS with a standardized MTX-based regimen, 31% preferred MTX monotherapy, 36% MTX with intravenous pulse CS, and 23% MTX with oral CS23. Given the lack of data to support the superiority of 1 regimen over another, there is clearly a need for comparative effectiveness studies in juvenile LS.
To assess feasibility and methodology for comparative effectiveness studies in juvenile LS, we evaluated the safety and tolerability of 3 MTX-based regimens (consensus treatment plans; CTP)23 in a pilot 1-year open-label study of 50 patients initiating treatment for active juvenile LS. We identified the frequency of and reasons for deviation from the CTP, adverse events (AE) associated with therapies, and response to treatment by several clinical assessments.
MATERIALS AND METHODS
Study protocol
The LS workgroup of the Childhood Arthritis and Rheumatology Research Alliance (CARRA) conducted this study; details on the development of the study and CTP choice are described elsewhere24. Each participating site (n = 10) obtained institutional ethics approval for both the study itself and the informed consent form, which included our intent to publish the results of the study and measures to protect confidentiality. The written consent was signed by either the patient or by the patient’s parent/guardian with written or verbal assent of the patient, as appropriate for age. Deidentified data were analyzed at Hackensack University Medical Center, the coordinating center, under ethics approval number Pro00001481.
Inclusion and exclusion criteria for participation in the study are shown in Table 123,25,26. At entry, patients started treatment with 1 of 3 MTX-based CTP (Figure 1A): MTX monotherapy (CTP A, 1 mg/kg/week, maximum 25 mg; same dose for all CTP), MTX with intravenous (IV) CS (CTP B; IV CS 30 mg/kg/dose, maximum 1000 mg, either 3 consecutive days/month × 3 months or 1/wk × 12 weeks), or MTX with oral CS (CTP C; prednisone or prednisolone 2 mg/kg/day, maximum 60 mg, divided bid, tapered to 1 mg/kg/d by 8 weeks, 0.5 mg/kg/d by 16 weeks, 0.25 mg/kg/d by 24 weeks, and off by 48 weeks). Subcutaneous administration of MTX was recommended. Choice of CTP was decided by treating physician and the patient’s family (discussed by Li, et al24). Patients were monitored at 6 study visits over 1 year: baseline, 2, 4, 6, 9, and 12 months, with recommended visit windows of ± 1 month. At the initial visit, demographic information was collected, including subtype as defined by Padua criteria24, medical history, treatment history for juvenile LS, and family history. At subsequent visits, medication history and AE were recorded. AE were graded according to the Common Terminology Criteria27. Laboratory studies were done at the discretion of the treating physicians. Most of the data were entered into a Web-based registry (i2b2 CARRA Legacy Registry); remaining data were recorded in a database at the principal investigator’s (PI) site. There was an optional biorepository substudy, banking blood samples for future studies.
The same physician evaluated a given patient at all study visits to avoid interrater variability. Evaluations included modified localized skin severity index (mLoSSI), LS Damage index (LoSDI), and measures we recently developed23,28,29. The mLoSSI divides the body into 18 anatomic sites for scoring, and is calculated as the sum of disease extension (scored 3 if present), erythema (scored 0–3, severe), and skin thickening of the lesion edge (scored 0–3, severe) at all affected sites28,30. Building from the mLoSSI, the LS Cutaneous Activity Measure (LSCAM) also scores additional variables that were found associated with activity in another study31. LSCAM is calculated as the sum of disease extension, erythema, maximum lesion skin thickening, violaceous color, tactile warmth, and waxy white or yellow; erythema and skin thickening are scored 0 to 3 (severe), the other variables 0 or 1 if present23. The LoSDI and LS Cutaneous Damage Measure (LSDam) sum dermal atrophy, subcutaneous atrophy, and dyspigmentation across affected sites29. The LSDam also scores maximum lesional skin thickening. Variables in LoSDI and LSDam are scored 0 to 3 (severe), with scoring examples provided in LS Scoring Atlas23.
Extracutaneous involvement (ECI) considered secondary to LS was scored as 0 or 1 if present; the list of scored items was generated based upon literature23,32,33. Joint involvement included arthritis (joint swelling) and contractures (limited range of joint motion without swelling). Growth difference was scored if the clinician considered it to be obvious and significant, and included limb girth and length differences, and facial and truncal hemiatrophy. Skin activity scoring was performed at all visits, while skin damage and ECI scoring were assessed at visits 0, 6, and 12 months. Efforts to standardize scoring are described elsewhere24.
Physician’s global assessment of disease activity (PGA-A) and physician’s global assessment of disease damage (PGA-D) were scored on 0–10 (high) Likert scales. PGA-A was scored at every visit; PGA-D at 0-, 6-, and 12-month visits. Physician assessment of overall disease status compared to baseline visit (Δ-disease status), and physician assessment of overall activity status compared to baseline visit (Δ-activity status) were scored on a 7-level Likert scale from major improvement (3) to major worsening (−3) at the last visit. These global assessments included consideration of extracutaneous involvement. No guidelines were provided on scoring these assessments; instead, each study investigator determined how to score them based upon their clinical evaluation and judgment. Patients and/or their parents were asked to complete health-related quality of life assessments at visits 0, 6, and 12 months34,35,36,37,38.
Protocol deviations
Changing the route of MTX administration, having a temporary reduction or lapse in taking MTX (< 2 weeks), and missing a prednisone taper target because of a delayed study visit were considered minor deviations. Actions considered major deviations were stopping treatment (by physician, patient, or family), prolonged change in specified dose or duration of CTP medication(s), and/or using an immunomodulator not specified by initial CTP. Patients who withdrew or failed to followup were scored as dropouts and not included in response analysis. Treatment failure (TF) was defined as inadequate response to the initial regimen leading to treatment with additional CS and/or non-CTP–specified immunomodulator.
Data analysis
Data were summarized using frequencies (%) for categorical variables and median [interquartile range (IQR)] for continuous variables. Patient response was analyzed based upon intent to treat and censored for TF. Comparisons across groups, where appropriate, were performed using chi-square or Fisher’s exact tests for categorical data and Wilcoxon rank-sum tests or sign-rank tests for continuous, nonparametric data. Normality was assessed using the Shapiro-Wilks test (p < 0.001). Spearman correlation was performed to examine correlations between activity, damage, and improvement in scores across visits, as well as lesion and disease characteristics. We used the most recent non-missing data when comparing activity or damage scores to a previous visit. An alpha of 0.05 was used to assess significance; given the small sample, we also noted variables with a p value < 0.1. Data were analyzed using SAS version 9.4 (SAS Institute Inc.).
RESULTS
Patient characteristics
Patients were enrolled into all 3 CTP, achieving the target enrollment of 50 patients (Figure 1). Most patients were white (46; 92%) and had linear scleroderma subtype (30; 60%). The median age of disease onset was 9.6 years (IQR 6.1–11.7), and disease duration 13 months (IQR 6–53.8). Forty-one patients (82%) were newly diagnosed and naive to systemic immunosuppressants; the other 9 patients had a disease relapse off prior systemic treatment. Thirty-seven patients (74%) had ECI, primarily growth difference (23; 46%) and/or joint involvement (20; 40%).
Six patients discontinued treatment before the last visit (dropouts, Figure 1A). The remaining 44 patients completed the study and were analyzed for treatment response and AE. CTP groups differed in the number of affected anatomic sites (p = 0.021), and subtype frequency, growth disturbance, and antinuclear antibody (ANA) positivity (p < 0.1; Table 2). No differences were found for age, disease duration, race, ethnicity, ECI, or family history of autoimmune disease (Table 2, and data not shown).
Safety and tolerability of CTP
Six patients (14% of patients completing last visit) had a major deviation from the CTP because of medication intolerance (Figure 1A), including 1 (2%) grade 3 AE, a hospitalization for gastroenteritis and dehydration, considered unrelated to the CTP (B). Twenty-one patients (48%) had a grade 2 AE, with no difference in frequency across groups. The most common grade 2 AE were gastrointestinal problems (n = 11, 25%), which were managed by ondansetron treatment (n = 5), changing route of administration (n = 2), reducing dose (n = 2), or discontinuing MTX (n = 3). Other grade 2 AE were mood problems (n = 5, 11%), infection (n = 3, 7%), laboratory abnormalities (n = 3, 7%), and 1 each of seizure recurrence, hair thinning, lip and nasal ulcer, and blurred vision. One patient discontinued IV CS because of access difficulties. Patients who experienced TF and received additional medication(s) had more grade 2 AE (82% vs 36% for non-TF, p = 0.009). Grade 1 AE were more frequent in the oral CS regimen (CTP C, p = 0.005), most commonly weight gain or Cushingoid features (n = 13, 29.5%), gastrointestinal problems (n = 10, 23%), and mood problems (n = 4, 9%).
Response to CTP
Nearly all patients improved compared to baseline by 2 PGA (Δ-disease status, Δ-activity status). The Δ-activity status rated 43 of 44 (98%) patients as improved, versus 40 (90%) by Δ-disease status (Figure 1B). Both assessments rated a patient who had disease extension and return of induration at 9 months as worsened. The Δ-disease status rated an additional 3 patients as worsened including 1 with seizure recurrence at 12 months without cutaneous activity signs (linear scleroderma of the head subtype), and 2 who developed more damage features (worsening facial or other growth difference) with no or minimal residual skin disease activity.
While this pilot study was not powered to assess efficacy, we analyzed the response of patients in each CTP to provide information on our assessments and identify potential confounders. PGA-A, mLoSSI, and LSCAM scores decreased from baseline to last visit in all CTP groups when analyzed based upon intent to treat and censored for TF (Table 3). PGA-D, LoSDI, LSDam, and several patient/parent health-related outcome scores did not differ from baseline to last visit (Table 3).
To track response across the 6 visits, we examined whether and when patients achieved a PGA-A score of 0. PGA-A = 0 occurred in 42% of CTP A, 44% of CTP B, and 67% of CTP C patients (Table 4A). When patients with TF were excluded, percentages increased to 50% (CTP A), 59% (CTP B), and 75% (CTP C; Table 4A). Patients in CTP C (MTX plus oral CS) appeared to reach 0 sooner, but these differences were not significant. Patients in CTP C had lower baseline LSCAM and LSDam scores than patients in CTP A and B (p < 0.05; Table 3A); mLoSSI or LoSDI scores were not different across groups (Table 3A). The lower cutaneous scores for patients in CTP C may reflect their having less extensive skin disease (Table 2).
Patients were stratified into 4 levels of baseline PGA-A scores: 1–2, 3–4, 5–7, and > 8, to determine whether the baseline PGA-A score affected the likelihood of reaching PGA-A = 0. Higher baseline PGA-A scores were associated with both lower likelihoods and slower rates of achieving PGA-A = 0. Two-thirds of non-TF patients who had a baseline PGA-A score of < 7 reached 0 within 12 months, compared to 13% of patients with a PGA-A of 8 or above (p = 0.045; Table 4B).
Extent of TF
Eleven patients were considered to have experienced TF and received additional treatment: IV CS (n = 10) and/or mycophenolate mofetil (MMF; n = 8). A larger percentage of patients experienced TF in CTP A (n = 4, 33%) and CTP B (n = 6, 25%) than CTP C (n = 1, 12.5%), but these differences were not significant (Figure 1B). Four patients enrolled in CTP A (MTX monotherapy) received IV CS, and 6 patients in CTP B received longer IV CS courses; 1 also received oral CS. Eight patients (1 CTP A, 6 CTP B, 1 CTP C) received MMF, all concurrently with MTX. Median time to TF was at the 4-month visit (median 116 days); range 2- to 9-month visit (97–302 days).
At the last visit, most patients who experienced TF were rated as having a moderate level of improvement (Δ-activity, n = 6, 54.5%) versus the major level of improvement given to most non-TF patients (n = 18, 56.3%). PGA-A, mLoSSI, and LSCAM scores decreased from baseline to last visit in patients who experienced TF (Table 5), but these patients did not experience as much improvement as non-TF patients. At last visit, they had higher PGA-A, mLoSSI, and LSCAM scores than non-TF patients (p = 0.011, 0.003, < 0.001, respectively; Table 5). At the last visit, PGA-D scores were also higher in TF than non-TF patients (p = 0.003; Table 5). Patients who experienced TF were more likely to be non-white race and to have mixed morphea, generalized morphea, pansclerotic morphea, ECI, joint involvement, and truncal lesions (Table 5). There was a trend toward significance for limb involvement, more extensive skin disease, older age of disease onset, ANA positivity, and family history of a rheumatic disease (p < 0.1; Table 5).
DISCUSSION
To our knowledge, this is the first prospective comparative effectiveness study of 3 different MTX-based regimens for juvenile LS. The study aims were to assess the safety and feasibility of the 3 standardized regimens and evaluate our assessment tools. Our prior survey of North American pediatric rheumatologists demonstrated the need for comparative effectiveness studies, because responders were almost equally divided on their choice of these 3 MTX-based CTP23. Our investigators similarly showed distinct treatment preferences, with half the sites choosing a single CTP to treat all their study patients24. Because the physician and parent/patient jointly selected the treatment CTP, some of these choices reflect family preference with at least 3 families opting for MTX monotherapy over their physician’s recommendation for a CS-associated CTP.
All 3 CTP were found to be safe, generally well tolerated, and effective, with over half the patients having PGA-A = 0 by their last visit. However, only about half completed the CTP they initially started; dropouts and AE accounted for 52% of the deviations, and TF the remainder. The frequencies of AE for MTX and CS were similar across the CTP. Most AE were managed by supportive care or brief pauses in treatment. Six patients (14%) had their CTP regimen changed because of intolerance. This frequency is higher than was reported in the CS-based randomized controlled trial of MTX15, possibly reflecting higher MTX dose (1 mg/kg/dose vs 15 mg/m2, potentially double for a small child) or different route of administration. Other explanations are differences in CS regimen including the route, longer and higher oral prednisone dose, and/or patient characteristics.
Most who experienced TF received mycophenolate mofetil (MMF), all in combination with MTX. Our group had previously developed an MMF-based CTP that allowed for use of MMF with or without MTX24, but it was not included in the current study. The high frequency of CTP failures suggests that additional standardized treatment regimens are likely needed for comparative effectiveness studies, including evaluating the efficacy of MMF separate from MTX, and examining biologics or other disease-modifying antirheumatic drugs.
Several physician scores and skin activity measures were used to assess treatment response. We found limitations in the physician assessment of overall disease status compared to baseline (Δ-disease status) because it encompasses damage as well as activity status. Three patients developed worsening of extracutaneous features by their last visit, and although all were rated as improved by Δ-activity status, all were scored as having a worsened disease status compared to baseline. These patients demonstrate that damage can progress despite effective control of disease activity. We therefore recommend that activity and damage be separately assessed for juvenile LS, with initial treatment efficacy evaluated primarily by activity measures.
We detected differences between subpopulations, suggesting that our assessments are sensitive enough to use in comparative effectiveness studies. Our LSCAM includes more variables than are found in the mLoSSI, the model for our measure. Both measures detected significant changes in skin scores between first and last visit; the LSCAM also detected differences between CTP and showed a wider range of scores in the TF patients than did the mLoSSI. Higher baseline PGA-A scores were associated with a slower and lower rate of response, especially if the baseline PGA-A score was 8 or higher. Our analysis of the TF patients identified additional variables associated with poorer response including truncal lesions, ECI, joint involvement, non-white race, and mixed morphea subtype. More extensive skin involvement, limb involvement, and ANA positivity approached significance. Further evaluation is needed to determine whether some of these variables are confounders that should be considered when designing treatment trials or evaluating response.
Because this was a pilot study, it was not powered for determining the relative efficacy of the CTP. There was a trend toward a faster and higher rate of response in one of the CTP groups, which may reflect differences in disease severity and pattern between groups. Patients in the CTP associated with poorer response had higher LSCAM scores at baseline, more extensive skin disease, and higher frequency of mixed morphea subtype and growth disturbances (p < 0.05), as well as a greater likelihood of joint involvement (p < 0.1). Some of these variables were also identified in patients who experienced TF, suggesting they may represent prognostic features that are confounding the relationship between treatment and response. In a larger sample, case control methods could be used to match subjects according to variables associated with disease severity/activity to more effectively evaluate the efficacy of the CTP without selection bias.
Study limitations include the large variation in time of the “12-month” study visit, with several patients seen outside of the recommended window of 12 ± 1 month. This is to be expected in prospective observational studies, where visits are completed according to requirements for care rather than the more rigorous schedule of standardized treatment trials. Another limitation is the absence of serological biomarkers to identify or quantify disease activity. We performed several clinical assessments in tandem to improve our accuracy of clinical scoring. However, physicians may be biased toward a given CTP efficacy, so use of assessors blinded to the chosen CTP may be advisable.
Our study findings add substantially to the understanding of treatment strategies for juvenile LS patients. All 3 CTP were found beneficial, yielding tangible and significant improvements in disease activity. Our cutaneous activity measure and physician assessments could identify both changes during the course of treatment and differences in response between some patient subsets. More studies are needed to assess the reliability, validity, and performance of our LSCAM measure compared to mLoSSI. Further study is needed to determine if these clinical outcome measures are sensitive enough to detect differences in relative efficacy between treatments. Studies are also needed to develop biomarkers to enhance monitoring of response.
The widespread occurrence of ECI in our cohort (> 70%) demonstrates that juvenile LS is a serious disease with a high potential for severe morbidity. All patients who experienced TF had ECI; at the end of the study they had higher LSCAM and physician global damage scores. In addition, despite nearly all patients having a marked reduction in disease activity level, damage scores did not improve, and in some patients worsened. Our findings suggest that patients with a larger disease burden (skin and other tissues) are at risk for more damage and require more treatment for disease control. It may be worthwhile considering treat-to-target strategies for juvenile LS, as has been proposed for rheumatoid arthritis39. Larger, more robust comparative effectiveness studies are needed to objectively identify optimal treatment strategies that minimize disease burden, medication intolerance, and damage progression. Findings from this study should help inform the development of such studies. We expect an iterative process that will improve our ability to optimize therapy and longterm outcome for these patients.
Acknowledgment
The authors thank Laura Schanberg for her key help and support of this project. We also thank the following CARRA Registry site principal investigators and research coordinators: E. Anderson, H. Benham, B. Feldman, K. Francis, I. Goh, J. Jaquith, K. Schollaert-Fitch, C. Smith, J. Weiss, J. Wooton, and all study staff who helped recruit patients for the CARRA Registry. We thank the patients and families who participated in our study. We gratefully acknowledge Marilynn Punaro and Brian Feldman for their sage advice and support of this project. We thank Xiaohu Li for initial help with data analysis; Maria Carputo and Danielle Wolfe for help with data entry; and Justine Griswold, Mary Ellen Riordan, Maria Carrollo, and Jeannette Haugh for help with study coordination, data management, and meeting planning. We thank Jane Winsor and Kelly Mieszkalski for their expertise and help with the CARRA Legacy Registry.
Footnotes
The Childhood Arthritis and Rheumatology Research Alliance (CARRA) Legacy Registry was supported by a grant from the US National Institute of Arthritis and Musculoskeletal and Skin Diseases of the National Institutes of Health (NIH) under award number RC2AR058934. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. CARRA, Friends of CARRA, the Arthritis Foundation, and the Duke Clinical Research Institute also supported the CARRA Legacy Registry. This study was funded primarily by an innovative research grant from the Arthritis foundation (PI: SL). CARRA provided additional funding for data analysis (publication grant, PI: SL) and in-kind resources to support management and maintenance of the registry for the study, online meetings, and face-to-face meetings at annual meetings of CARRA and the American College of Rheumatology. Support for biobanking was provided by independent funding from The Nancy Taylor Foundation for Chronic Diseases Inc. (PI: KST, Children’s Hospital of Pittsburgh Pediatric Scleroderma Fund).
- Accepted for publication September 30, 2019.