The total score of the Short Physical Performance Battery should be used but not each item separately: a systematic review with meta-analysis of reliability studies
Autori
Eusepi Davide [UniCamillus, International Medical University in Rome, Rome, Italy]
Piscitelli Daniele [Department of Kinesiology, University of Connecticut, Storrs, CT, USA; School of Medicine and Surgery, University of Milano-Bicocca, Monza, Italy]
Ugolini Alessandro [Private Practice, Empoli (FI), Italy]
Graziani Lorenzo [Program in Physical Therapy, University of Florence, Florence, Italy]; Coppari Andrea [Physical and Rehabilitation Medicine Unit, Azienda Sanitaria Territoriale, Jesi (AN), Italy]
Carlizza Alessandra [UniCamillus, International Medical University in Rome, Rome, Italy]
Caselli Serena [Unità Operativa Complessa di Medicina Riabilitativa, Azienda Ospedaliero-Universitaria di Modena, Modena, Italy]
La Porta Fabio [IRCCS Istituto delle Scienze Neurologiche di Bologna, Bologna, Italy]
Paci Matteo [Department of Allied Health Professions, Azienda USL Toscana Centro, Florence, Italy; Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy]
Di Bari Mauro [Department of Experimental and Clinical Medicine, University of Florence, Florence, Italy; Department of Medicine and Geriatrics, Unit of Geriatrics, Azienda Ospedaliero-Universitaria Careggi, Florence, Italy]
Pellicciari Leonardo [IRCCS Istituto delle Scienze Neurologiche di Bologna, Bologna, Italy]
Background and aims
Short Physical Performance Battery (SPPB) assesses lower limb function and mobility in older adults. SPPB is composed of three tests balance, walking ability, strength of the lower limbs), scored by a 5-point Likert scale. Its total score is calculated by summing each item’s score and ranges from 0 (lowest performance) to 12 (highest performance) points.
Although there is evidence of its predictive validity in various health outcomes, quantitative data on the SPPB reliability lacks. Therefore, this study aims to assess the reliability (i.e., intra-, inter-rater reliability and measurement error) in adult patients with a systematic review with meta-analysis.
Methods
This study protocol was prospectively registered on PROSPERO (CRD420251003320). PubMed, EMBASE, CINAHL, PsycINFO and Scopus were queried from their inception to March 2025 with a two-part search strategy framework of assessment tool (i.e., SPPB) and measurement properties (i.e., reliability and measurement error). Studied were included if they assessed SPPB intra- and inter-rater reliability and measurement error in adult patients. Studies reporting alternate SPPB versions (e.g., electronic or remote versions) were excluded. Two independent reviewers conducted the study selection, extrapolated the data, assessed the methodological quality (COSMIN Risk of Bias) and quality of evidence (CoE) (GRADE approach). Random effect meta-analyses were performed when at least two studies assessed the same property of the total score or the same subscale.
Results
Twenty-two studies were included for a total of 1,730 subjects. For total score, intra-rater (ICC=0.88) and inter-rater reliability (ICC=0.86) were rated sufficiently (high QoE), while the measurement error (MDC=1.90) was rated insufficiently (high QoE). For balance subscale, intra-rater (ICC=0.69) and inter-rater reliability (ICC=0.77) were rated inconsistently (low QoE), while the measurement error (MDC=1.32) was rated as indeterminate (high QoE). For walking subscale, intra-rater (ICC=0.88) and inter-rater reliability (ICC=0.86) were rated sufficiently (moderate and high QoE, respectively), while the measurement error (MDC=0.51 points) was rated indeterminately (moderate QoE). For chair subscale, intra-rater (ICC=0.83) and inter-rater reliability (ICC=0.94) were rated sufficiently (moderate and high QoE, respectively), while the measurement error (MDC=2.14 points equal to 53.4% of the total score) was rated indeterminately (high QoE).
Conclusion
SPPB provides similar results if administered by the same (intra-rater reliability) or two different assessors (inter-rater reliability). However, measurement error of each subscale (except for the walking subscale) is higher than the measurement error of the total score. Therefore, the total score of the Short Physical Performance Battery should be used, but not each item separately.
REFERENCES
Gagnier JJ, de Arruda GT, Terwee CB, Mokkink LB; Consensus group. COSMIN reporting guideline for studies on measurement properties of patient‑reported outcome measures: version 2.0. Qual Life Res. 2025 Mar 28. doi: 10.1007/s11136-025-03950-x.
Guralnik JM, Simonsick EM, Ferrucci L, Glynn RJ, Berkman LF, Blazer DG, Scherr PA, Wallace RB. A short physical performance battery assessing lower extremity function: association with self-reported disability and prediction of mortality and nursing home admission. J Gerontol. 1994 Mar;49(2):M85-94. doi: 10.1093/geronj/49.2.m85. PMID: 8126356.
Mokkink LB, Boers M, van der Vleuten CPM, Bouter LM, Alonso J, Patrick DL, de Vet HCW, Terwee CB. COSMIN Risk of Bias tool to assess the quality of studies on reliability or measurement error of outcome measurement instruments: a Delphi study. BMC Med Res Methodol. 2020 Dec 3;20(1):293. doi: 10.1186/s12874-020-01179-5.
Pavasini R, Guralnik J, Brown JC, di Bari M, Cesari M, Landi F, Vaes B, Legrand D, Verghese J, Wang C, Stenholm S, Ferrucci L, Lai JC, Bartes AA, Espaulella J, Ferrer M, Lim JY, Ensrud KE, Cawthon P, Turusheva A, Frolova E, Rolland Y, Lauwers V, Corsonello A, Kirk GD, Ferrari R, Volpato S, Campo G. Short Physical Performance Battery and all-cause mortality: systematic review and meta-analysis. BMC Med. 2016 Dec 22;14(1):215. doi: 10.1186/s12916-016-0763-7.
