Systemic sclerosis (SSc) or scleroderma is a multisystem autoimmune disease in which autoantibody production, fibroproliferative vasculopathy, and fibroblast dysfunction culminate in skin thickening and end-organ complications.1 SSc is classified as either limited cutaneous scleroderma (lSSc) involving skin distal to the elbows and knees, and the face; or as diffuse cutaneous scleroderma (dSSc) involving the skin proximal to the elbows and knees. SSc is a relatively rare condition with an estimated prevalence of 150-300 cases per million (with Australia’s prevalence at the higher end of that range). However, it has the highest mortality of all the rheumatic diseases due to extra-cutaneous features including cardiopulmonary involvement, renal disease, vascular disease and gastrointestinal involvement.1
The current validated tool for assessing scleroderma skin involvement is the semi-quantitative Modified Rodnan Skin Score (mRSS) initially developed by Rodnan in 1979.2 The score originally involved 26 body sites, each rated from 0 (normal) to 4 (extreme thickness) but it has since been simplified and validated at 17 sites with thickness ratings of 0 to three.3-5 The mRSS monitors skin disease progression and severity over time,6 and correlates with disease severity scores, physician global assessments and biopsy findings.7 Rapidly progressive skin disease, as measured by the mRSS, has been shown to be a poor prognostic marker due to its association with internal organ involvement.3,8,9 However, the mRSS was not designed to describe or discriminate between the ooedematous, fibrotic or atrophic phases of skin change in SSc.10,11 There is also a controversy about whether the mRSS is measuring skin tethering rather than true thickness.11
Ultrasound (US) offers a potential alternative to measuring skin thickness and composition. Reports of US-detected skin thickening, even in macroscopically uninvolved skin, suggest that US may be a more sensitive means of assessing skin disease, with important prognostic implications.12,13 However, the literature is scant and heterogeneous in their methodology.11,12,14,15 Only Ch’ng et al systematically evaluated the 17 mRSS sites with US for intra- and inter-reader reliability.11 There has also been limited study of the relative reliability of individual sites, which may lead to a more practicable score than the score based on 17 skin sites. Knowledge of the prognostic utility of an US-based score and the sensitivity to possible changes over time is also limited.
The aim of this study was four-fold: first, to provide data regarding US measured skin thickness in people with SSc, second, to evaluate the correlation between US-measured skin thickness and the mRSS; third, to assess the intra- and inter-rater reliability of US-measuring skin thickness and assess individual site reliability; finally, to evaluate whether US could be used to monitor for changes in skin thickness over time in dSSc patients.
This retrospective study recruited the Western Australian participants of the Australian Scleroderma Cohort Study (ASCS). Patients met the 2013 American College of Rheumatology classification criteria for scleroderma and were studied from results generated between 2011 and 2013. Data were collected on the following: extent of skin disease (limited or diffuse); previous and current immunosuppression; use of vasodilator drugs; and presence and treatment of pulmonary arterial hypertension (PAH).
The mRSS and US measurements were recorded at the 17 mRSS sites.10 The mRSS involves skin palpation at 17 sites (face, both fingers, both hands, both forearms, both proximal arms, chest, abdomen, both thighs, both distal legs and both feet) to estimate “skin thickness” and assign a score from 0-3 units. The individual sites are summated to create an mRSS from 0-51 units. US measurements were made using an Esaote Mylab 70 XVG machine (Esaote, Genoa Italy) with multi-frequency (10-18 MHz) linear probe. However, 18MHz was used preferentially as it is more commonly accessible in clinical practice and allows a degree of device “futureproofing”. A stand-off pad (20mm thickness) was used. Skin thickness was defined as the mean value of 3 measurements at each site of the distance (cm) between the interface of skin stand-off pad and the hypodermis.
In each patient, the site skin thickness of the 17 sites scanned was summated to create a “cumulative US skin thickness score”. US measurements were taken twice on the same day each at visit by experienced clinician sonographers (SC at baseline) to determine intra- occasion and intra-rater reliability at baseline (visit 1) and two years (visit 2). At baseline, the patients were all rescanned by SC within a 2-week (visit 1b) period to determine inter-occasion intra-rater reliability (visit 1b). Patients were scanned at two years (visit 2), by a second sonographer (PC). Patients were rescanned by a third experience sonographer, HK at visit 1 and visit 2 to allow inter-rater reliability to be determined.
Progression, regression or stability of skin thickness was measured after 2 years in patients with dSSc only, as changes in skin thickness are not associated with long-term end-organ complications in lSSc patients.
Categorical data are described as frequency (n) and proportion (%), and compared with Fisher’s Exact Test. Continuous data are described as mean ± standard deviation or as a median and interquartile range (IQR), and compared by t-test or Mann-Whitney U-test. Repeated ultrasound measures were compared with Wilcoxon Signed-Rank or paired t-test. Spearman Rho correlation coefficients (Rs) described the association between skin thickness on US with mRSS scores. Intra- or inter-class correlation coefficients (ICC) were used to describe the reliability of measures within and between rheumatologists for US at each site. Definitions of reliability were based on ICCs of <0.5 (poor), 0.5-0.75 (good), 0.75-0.9 (moderate) and >0.9 (excellent) as established by Koo and Li.16
The study was conducted in accordance with the Declaration of Helsinki and was approved by the Royal Perth Hospital Human Research Ethics Committee. Informed consent was obtained from all patients before study enrolment. The study was supported (for purchase of US equipment) by the RPH Medical Research Foundation.
Sixteen participants were included in the study. Baseline characteristics are shown in Table 1. Site skin thickness is presented in Tables 2 and 3.
Correlation of US with mRSS.
Examining each skin site independently at baseline (17 sites in 16 people = 272 sites), there was a correlation between the US site skin thickness and the component of the mRSS assigned to the corresponding site (Spearman ρ = 0.272, p<0.001). When the 17 sites were added in an individual to produce a cumulative US score for each individual, no correlation was found with the individuals total mRSS (ρ = 0.076, p = 0.787).
US had at least good (>0.5) intra-rater reliability for both intra and inter-occasion domains at 13 of 17 sites and moderate (>0.75) reliability at 7 of 17 sites (Table 4). There was excellent intra-rater reliability when a cumulative 17 site US score was obtained (Table 4). Inter-rater reliability of US was at least good ( > 0.5) in both time points at 6 of 17 sites and the cumulative US score also had good reliability. Moderate reliability (>0.75) at both time points was found at 2 of 17 sites (face and left thigh) but was not ‘excellent’ at any site at either time point (Table 4).
Change in skin thickness over time
Of the six dSSc patients in the study, five had US measurements at the two-year visit. US demonstrated improvements in skin thickness in two of these five patients (40%) (Patient 1: at 12/17 sites, p<0.039; Patient 2: at 16/17 sites, p<0.001). At the two-year visit, dSSc patients had significant skin regression at the abdomen (0.17 to 0.13 cm, p=0.043). The dSSc patients who developed digital ulcers during follow-up (n=3/6, 50%) had less skin regression at the abdomen (0.00 vs -0.01 cm, p=0.019). In the remaining three patients, skin thickness remained stable.
In this study, US had at least good intra-rater reliability at 15 sites and good inter-rater reliability at 7 sites. Six anatomical sites had good intra- and inter-rater reliability. The intra- and inter-rater reliability was at least moderate (>0.75) when the cumulative score was performed, which may be due to the fact that the average of all sites was taken whereas the skin varies slightly depending on the body site and the exactly point at each site the measurement is taken. A previous study by Moore et al using a 22 MHz probe found intra- and inter-rater reliability with ICCs greater than 0.70 at all 17 sites.14 The discrepancy of reliability can be explained by their use of so-called Z-scores of maximum skin thickness relative to a group of healthy controls, whereas we used an average skin thickness score based on 3 measurements on SSc patients only. As far as we are aware, no other studies have measured thickness at all 17 Rodnan sites with US, limiting comparisons. Two studies examining reliability with similar frequency probe measured only two sites; one examined the proximal and middle phalanx of the second finger13 and demonstrated low intra- and inter-rater variability in US measurements. The other17 examined the finger and forearm and found an inter-rater variability of 1% at the finger and 0.0016% at the forearm.
The sites with the best intra- and inter-rater reliability (ie. with a moderate ICC) were the face and the thighs, although we observed a difference in laterality whereby the intra-rater reliability of the R thigh was stronger than that seen at the L thigh. Moore et al made similar observations,14 but the findings here are most likely due to chance.
We found a positive and statistically significant correlation between site skin thickness as measured by US and the site component of the mRSS, providing some construct validity to the ability of US to determine skin thickness. However, the correlation between US and the current gold standard (mRSS) was weak; as previously discussed the US score in this study measured only skin thickness, whereas it has been hypothesised that the mRSS measures skin tethering rather than thickness, while US measures skin thickness.11 This becomes further complicated when the cumulative 17 site US score is considered. No relationship was found between the individuals cumulative US skin thickness score and the mRSS. This may result from the the small numbers in this study, but demonstrates that further work needs to be undertaken to develop an US scoring tool in scleroderma.
US measured regression of skin thickness over a 2-year period was seen in two of five patients with dSSc, both of whom were on immunosuppressive therapy. Of the three patients whose skin did not regress, two were taking immunosuppressant drugs (mycophenylate alone in one case, and mycophenylate, rituximab and prednisolone in the other) and one was on no treatment. Drawing conclusions on the effect of treatment on skin regression is not possible with such small numbers. The natural history of SSc skin disease is that in some cases, there is regression without treatment. However, a recently published follow-up study of the Moore cohort14 examined 75 patients after one year at 5 of the original 17 sites by US. Twenty-one demonstrated skin thickness regression, and 35 progression. This supports our previous finding that US is sensitive to changes in skin thickness over time.17
The main limitations of this study include the small and heterogeneous sample of patients with diffuse and limited skin disease, and with differing disease stage and duration. This limits the generalizability of our findings. Skin thickness generally changes in the first 3-5 years of disease3,5,18 and the relatively long disease duration (median 9 years) of our participants may have limited the ability of US to detect changing skin thickness. Future studies with more dSSc patients with early disease would enable better assessment of US-measured skin thickness progression or regression over time, and subsequently evaluate the prognostic utility of US with regards to end-organ complications.
Although other studies have examined pixel count as a surrogate for dermal oedema and fibrosis and elastography as a surrogate for skin fibrosis, which may relate to skin phase, which has been shown to predict disease outcomes in a single study,9 we did not utilise these domains in our study as they are currently not widely accessible to the rheumatology community using standard office-based US equipment. Elastography was not available to us at the start of this study. We focused on reliability and prognostic significance of US-measured skin thickness likely to have wider clinical application.
In conclusion, we showed that US had good intra- and inter-rater reliability at 6 sites (face, chest, abdomen, both thighs, left leg). The cumulative score also demonstrated at least good intra and inter rater reliability. Site skin thickness as measured by US correlated weakly with the site component of the mRSS, but a 17 site cumulative score did not correlate with the validated mRSS. US detected skin regression in 2 of 5 dSSc patients. We conclude that although the mRSS may remain the gold standard for assessing SSc skin disease, the demonstrated reliability of US skin scoring provides a foundation for future work examining the role of US in scoring SSc skin and monitoring progression. The correlation between US scoring and disease severity, PGA or skin biopsy findings may also be topics for future studies.
1. A weak, but statistically significant correlation was found between site skin thickness as measured by US and the corresponding component of the mRSS (0-3 units)
2. US had good intra- and inter-rater reliability at the face, chest, abdomen, both thighs, left leg, as did the cumulative US skin thickness score.
3. Larger studies may shed light on US sensitivity to change in skin thickness over time.
Provenance: Externally Reviewed
Funding: The Royal Perth Hospital Research Foundation funded the purchase of the ultrasound equipment.
Ethical approval: Institutional approval obtained (see text)
Declarations: No conflict of interest declared
Acknowledgements: The authors thank members of the Australian Scleroderma Interest Group (ASIG) for establishing and maintaining the national database, Helen Marsden for maintaining and assisting with the access of data from the database for this study, and Warren Raymond for statistical advice.
Corresponding Author: Dr. Pauline Habib, BJC Health, Chatswood, New South Wales 2067, Australia. Email: Paulinehabib@ymail.com