Korean J Pain 2020; 33(2): 153-165
Published online April 1, 2020 https://doi.org/10.3344/kjp.2020.33.2.153
Copyright © The Korean Pain Society.
1Complex Diseases & Genome Epidemiology Branch, Division of Epidemiology, School of Public Health, Seoul National University, Seoul, Korea
2Department of Epidemiology, School of Public Health and Institute of Health and Environment, Seoul National University, Seoul, Korea
Correspondence to:Joohon Sung
Complex Diseases & Genome Epidemiology Branch, Division of Epidemiology, School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Korea
Received: October 28, 2019; Revised: December 19, 2019; Accepted: January 1, 2020
This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Background: Well-validated risk prediction models help to identify individuals at high risk of diseases and suggest preventive measures. A recent systematic review reported lack of validated prediction models for low back pain (LBP). We aimed to develop prediction models to estimate the 8-year risk of developing LBP and its recurrence.
Methods: A population based prospective cohort study using data from 435,968 participants in the National Health Insurance Service–National Sample Cohort enrolled from 2002 to 2010. We used Cox proportional hazards models.
Results: During median follow-up period of 8.4 years, there were 143,396 (32.9%) first onset LBP cases. The prediction model of first onset consisted of age, sex, income grade, alcohol consumption, physical exercise, body mass index (BMI), total cholesterol, blood pressure, and medical history of diseases. The model of 5-year recurrence risk was comprised of age, sex, income grade, BMI, length of prescription, and medical history of diseases. The Harrell’s C-statistic was 0.812 (95% confidence interval [CI], 0.804-0.820) and 0.916 (95% CI, 0.907-0.924) in validation cohorts of LBP onset and recurrence models, respectively. Age, disc degeneration, and sex conferred the highest risk points for onset, whereas age, spondylolisthesis, and disc degeneration conferred the highest risk for recurrence.
Conclusions: LBP risk prediction models and simplified risk scores have been developed and validated using data from general medical practice. This study also offers an opportunity for external validation and updating of the models by incorporating other risk predictors in other settings, especially in this era of precision medicine.
Keywords: Big Data, Chronic Pain, Intervertebral Disc Degeneration, Low Back Pain, National Health Programs, Proportional Hazards Models, Recurrence, Risk Assessment, Risk Factors, Spondylolisthesis.
Low back pain (LBP) is a condition characterized by pain, muscle tension, discomfort, or stiffness below the costal margin and above the inferior gluteal folds, with or without sciatica . LBP is a common disorder causing disability, severe pain, and prolonged sick leave at personal and social expense . This condition occurs in approximately 60%-80% of people at some points in their lives , with a potential childhood onset , and an estimated 6%-10% of acute LBP patients experience repeated episodes . The annual and point prevalence of LBP approximates 45% and 30%, respectively . Lee et al.  reported a 17.1% total prevalence of LBP in Korea, whereas among hypertensive individuals, lifetime prevalence was 34.4% . The reported prevalence of LBP varies substantially depending on the case definition used .
The course of LBP is highly variable , and the most appropriate description of LBP is based on its duration and the quality of symptoms that accompany the pain . LBP is episodic or recurrent , with approximately 36% of individuals who experience an episode of LBP presenting with recurrence within one year . Recurrence is associated with multiple treatments and work-related time loss, which are costly to the individuals and to the society . LBP is the most important contributor to the Korean disability-adjusted life years and the disease burden is higher in females [15,16]. Well-validated risk prediction models help to identify individuals at high risk of diseases and suggestion of preventive measures.
Despite a high disease burden , the structural origin of most back pain episodes is unknown, with a poor correlation between symptoms and structural abnormalities [18,19], and often considered non-specific . Risk factors associated with LBP include; female sex , older age , smoking , psychological stress , depression , education, occupation, income , high body weight , physical inactivity , coronary artery disease (CAD) , dyslipidemia , diabetes mellitus , disc degeneration (DD) , history of back injury , previous episodes , bone mineral density (BMD) disorders , spinal stenosis , and spondylolisthesis .
A 2018 systematic review by McIntosh et al.  reported an absence of validated prediction models for LBP. Therefore, we undertook this study to develop and validate prediction models to estimate the 8-year risk of developing LBP as well as the 5-year recurrence risk.
We developed and validated risk prediction equations based on guidelines stipulated by TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) protocols .
A population based prospective study using data from National Health Insurance Service–National Sample Cohort was enrolled from 2002 to 2010. The data was comprised of members from different professions and having different demographic attributes, making it representative of the general Korean population. This computerized database contains longitudinal records of all claims data from anonymised patients, including diagnostic codes, treatment details, monthly insurance premiums, prescriptions, laboratory clinical results, physician visits, and demographic information. The database uses a disease diagnostic coding system based on the Korean Classification of Diseases, Sixth Revision, which is compatible with international classification of diseases, 10th revision, clinical modification (ICD-10-CM) . The detailed description of this cohort profile has been published elsewhere .
Participants were assigned randomly to the derivation and validation samples based on the split sample method using a ratio of 2:1 . Based on literature reviews and established hypotheses, we extracted data on disease diagnosis, date of diagnosis, prescription, sex, age, insurance premium as a proxy for income grade (socioeconomic status), anthropometric measures, smoking status and alcohol consumption, physical activity, fasting blood glucose, total cholesterol, blood pressure measures, days seeking care (consultations), days of prescription (length of prescription), days of hospitalization and premorbid conditions (diabetes, CAD, hypertension [HTN] and DD, history of back and hip injury, spinal stenosis, BMD disorders, and spondylolisthesis). We used ICD-10-CM codes to extract premorbid conditions such as diabetes (E10-E14), CAD (I20-I25), HTN (I10-I15), intervertebral DD (IVDD) (M50-M518), disorders of bone density and structure (M80-M85), spinal stenosis (M480.0-M480.8), spondylolisthesis (M4310-M4318), and injury of back, hip, and thigh (S130-S139, S330-S3319, and S70-S79, respectively).
Linear variables were assessed as both continuous and categorical predictors; however, to construct simplified risk scores, continuous predictors were categorized into clinically meaningful groups. Body mass index (BMI) was categorised as < 18.5 kg/m2, ≥ 18.5 kg/m2 to 24.9 kg/m2, ≥ 25 kg/m2 to 29.9 kg/m2, and ≥ 30 kg/m2; blood pressure or HTN status was categorised based on diastolic blood pressure (DBP) and systolic blood pressure (SBP) as SBP < 120 mmHg and DBP < 80 mmHg, SBP ≥ 120-139 mmHg or DBP ≥ 80-89 mmHg, SBP ≥ 140-159 mmHg or DBP ≥ 90-99 mmHg, and SBP ≥ 160 mmHg or DBP ≥ 100 mmHg, or medical utilisation due to HTN. Fasting blood glucose was categorized as < 100 mg/dL, 100-125 mg/dL, and ≥ 126 mg/dL, or medical utilisation due to diabetes. Total cholesterol was categorised as < 200 mg/dL, 200-239 mg/dL, and ≥ 240 mg/dL. Smoking was grouped into those who had never smoked, former smokers, and current smokers, and alcohol consumption was categorized into rare drinkers (< 2 times/mo), moderate drinkers (2-3 times/mo), and heavy drinkers (≥ 4 times/mo). Physical activity levels were categorised based on frequency of physical activity per week into low (none), moderately active (1-3 times/wk), and very active (≥ 4 times/wk). Socioeconomic status was categorised based on monthly insurance premiums, on a scale of 100% to proxy income grade, into low (< 30%), medium (30%-60%), and high (> 60%). Baseline age was categorised as < 45 years, 45-54 years, 55-64 years, and ≥ 65 years. We imputed missing data using covariate values measured at the nearest time points.
The primary outcome of interest was time to first diagnosis of LBP among LBP-free participants at baseline. We used four categories of LBP ICD-10-CM codes, including codes for LBP (M54.5, M54.50, M54.51, M54.52, M54.53, M54.54, M54.55, M54.56, M54.57, M54.58, and M54.59), codes for lumbago with sciatica (M54.4, M54.40, M54.41, M54.42 M54.43, M54.44, M54.45, M54.46, M54.47, M54.48, and M54.49), codes for sciatica (M54.3, M54.30, M54.31, M54.32, M54.33, M54.34, M54.35, M54.36, M54.37, M54.38, and M54.39) and unspecified dorsalgia (M54, M54.9, M54.90, M54.91, M54.92, M54.93, M54.94, M54.95, M54.96, M54.97, M54.98, and M54.99). The above categories have been used in case definitions of LBP in other studies [40-42]. The earliest recorded date for the LBP diagnosis was the index date for the diagnosis, and participants with history of LBP at baseline were excluded in the analysis of the primary outcome. Participants with no recorded LBP during follow-up were censored at the last recorded date, death, or the study end date (December 31, 2010). We defined person years at risk as the difference between the entry date and the right censoring date. The secondary outcome was time to LBP recurrence within 5-years following the index diagnosis in a consecutive cohort of LBP patients. A recurrence was defined as an episode occurring after at least 90 days from the index date of LBP diagnosis. For recurrence, we defined person years at risk as the difference between the initial date of LBP diagnosis and the right censoring date, and LBP patients who never experienced a recurrence were censored at the last recorded date, death, or the end of the 5-year follow up period.
We used the Cox proportional hazards model to assess associations between risk predictors and LBP. We first conducted two analyses: univariate analysis and fully adjusted analysis adjusting for age, sex, income grade, smoking status, physical activity, and alcohol consumption. We checked the Cox proportional hazards assumptions graphically and assessed the functional form of covariates for linearity using cumulative martingale residuals, Schoenfeld residue plots, and the Kolmogorov-type supremum test based on 1,000 simulation patterns. We used hierarchical cluster analysis to select the most representative variable for each cluster of correlated variables and assessed estimated coefficients for predictors in the univariate analysis to select representative predictors for model derivation. The models were fitted and variables retained if they were significant at α = 0.01 using the backward selection procedure. To construct a risk score, the estimated β coefficient for each variable was multiplied by 100 and rounded to the nearest integer, with the total score obtained by a summation of the scores for each predictor. The same model derivation approach was used in the derivation of models for the primary and secondary outcomes.
We assessed calibration using the Hosmer–Lemeshow (H–L) type χ2 statistic as extended for survival data by Nam and D’Agostino . We calculated this statistic by dividing the data into 10 groups (deciles) based on the predicted probabilities, and the average predicted probabilities for the deciles were compared to the actual risk probabilities of LBP. The associated calibration graph was obtained. The model discrimination was evaluated based on Harrell’s C-statistic, a modification of the area under the receiver-operating characteristic (ROC) curve adapted to survival data , and also calculated the positive predictive value (PPV), the negative predictive value (NPV), sensitivity and specificity, and accuracy based on optimal cutoff values determined by Youden’s index. In addition, the brier score which simultaneously measures calibration and discrimination was calculated. All analyses were conducted using SAS ver. 9.4 (SAS Institute Inc., Cary, NC), and R v.3.5.2 (R Foundation, Vienna, Austria). This study was approved by the Institutional Review Board of the Seoul National University (No. E1811/002-009).
The extracted data consisted of 502,342 participants. We excluded 66,374 participants who had experienced at least one episode of LBP before 1st January 2004. During the median follow-up of 8.4 years (2.0-8.9 yr), there were 95,564 (32.9%) and 47,832 (33.0%) newly diagnosed (first onset) LBP cases among 290,879 and 145,089 participants in the derivation and validation cohorts, respectively. The total number of person-years of follow-up was 3,205,271 years. The mean (standard deviation) of the covariates and the distribution of the baseline characteristics among cohorts are presented in
The covariates assessed in the derivation of the risk prediction model for the primary outcome consisted of age, sex, income grade, smoking status, alcohol consumption, physical activity, BMI, fasting blood glucose/diabetes, blood pressure/HTN, total cholesterol, IVDD, history of back injury, spinal stenosis, history of BMD disorders, and spondylolisthesis.
The Youden’s J statistic suggested a risk probability of ≥ 0.795 and ≥ 0.430 as the optimal cutoff points to define high-risk individuals based on the derived prediction equations for the first onset of LBP and 5-year recurrence models, respectively. These thresholds showed an accuracy of 0.786, PPV of 0.825, and NPV of 0.777, and an accuracy of 0.694, PPV of 0.724, and NPV of 0.617 in the validation cohorts of first onset and 5-year recurrence models, respectively. The details of other model performance measures for the low back pain onset and recurrence prediction models and simplified risk scores are presented in
Basing on the parsimonious models, individualized probability of developing LBP within the years of follow-up (ȶ = 8 yr), or its recurrence (ȶ = 5 yr) for an individual with covariate values χ = (χ1, ....., χK) for K risk factors can be estimated using the following equation:
Sₒ(ȶ) is the baseline survival probability at time (ȶ) for an individual with all covariates equivalent to zero (0), and the
The observed minimum and maximum sum of risk points were –33 and 249, respectively. The median risk score was 36, while the 25th and 75th percentiles were 7 and 75, respectively. The Youden’s J statistic suggested a risk score of ≥ 103 as the optimal cutoff point to define high-risk individuals based on the simplified risk score. This threshold showed an accuracy of 0.693, NPV of 0.705, and PPV of 0.597 in the validation cohort.
The observed minimum and maximum sum of risk points were –14 and 120, respectively. The median risk score was 23, while the 25th and 75th percentiles were 7 and 45, respectively. The Youden’s J statistic suggested a risk score of ≥ 4 as the optimal cutoff point to define high-risk individuals based on the 5-year recurrence simplified risk score. This threshold showed an accuracy of 0.628, PPV of 0.646, and NPV of 0.499 in the validation cohort.
Case: A 52-year-old female with an income grade of 30%-60%, who is a moderate drinker (2-3 times/mo) with low physical exercise (none), obese (BMI > 30 kg/m2), with normal blood pressure (SBP < 120 mmHg and DB
The Sₒ(ȶ) is the baseline survival probability at time (ȶ = 8 yr) for an individual with all covariates equivalent to zero, which was estimated by Cox regression analysis.
Case: A 36-year-old female with income grade of 30%-60%, low BMI (< 18.5 kg/m2), without a history of IVDD but with a previous diagnosis of spondylolisthesis and who receive low back treatment for more than 8 days during the initial LBP episode can have her probability of 5-year LBP recurrence estimated, based on the point system, as follows:
The Sₒ(ȶ) is the baseline survival probability at time (ȶ = 5 yr) for an individual with all covariates equivalent to zero, which was estimated by Cox regression analysis. The beta coefficient was set to an integer by multiplying by 100, and also was tabulated for both outcomes (
This study was based on a large, representative Korean population with data obtained from a well-established national cohort . The prevalence of LBP in this study was comparable with previous studies [6-9,45]. Risk prediction equations based on general medical practice data are easily implemented in medical practice , and we believe our results are applicable to the general Korean population. LBP is predictable and individuals can reduce their risks by modification of lifestyle risk factors and managing associated premorbidities. These equations were derived from a variety of candidate predictors, including demographics, anthropometrics, premorbid conditions, and several clinical measurements which individuals and clinicians are likely to know, which makes them easily applicable.
The prediction model of first onset LBP consisted of age, sex, and income grade, alcohol consumption, physical exercise, BMI, total cholesterol, blood pressure, BMD disorders, DD, and spinal stenosis. The model of the 5-year recurrence risk was comprised of age, sex, income grade, BMI, spondylolisthesis, DD, and days of prescription. Based on the simplified risk scores, age, DD, and sex conferred the highest risk points for LBP onset, with maximum possible risk points of 98, 53, and 32, respectively, whereas age, spondylolisthesis, and DD conferred the highest risk for recurrence with 51, 34, and 16 risk points, respectively. Low BMI, moderate physical activity, moderate alcohol consumption, and high blood pressure or antihypertensive medication were inversely associated with LBP onset in the multivariate analysis (
The equations showed excellent calibration with good agreement between observed and predicted risk, which is extremely important with respect to making decisions in clinical practice. Furthermore, the models showed good discrimination abilities with Harrell’s C-statistic of 0.812 (95% CI, 0.804-0.820) and 0.916 (95% CI, 0.907-0.924) in validation cohorts of LBP onset and 5-year recurrence models, respectively. The equations may be useful in informing clinicians and patients about LBP risks, the prognosis of initial episodes and prevention strategies. Knowledge of personalized risk can motivate individuals to reduce their risks through appropriate interventions, thereby promoting population health and reducing societal and personal costs. The equations can be used when a clinician counsels individuals after a routine check-up by providing information regarding their risk profile and giving the precise probability of LBP or its recurrence. This will motivate lifestyle modifications and promote adherence to the treatment of some premorbidities which are predictive of LBP. Reducing risk factors associated with metabolic syndrome (MetS), proper and regular therapy for individuals with MetS and management of BMD disorders can reduce LBP risk. Furthermore, lifestyle modification can reduce LBP risk conferred through MetS components by prevention of complications from obesity, HTN, and dyslipidemia. The equations will also improve self-awareness regarding overall health status because some predictors in the models are also predictive of other health outcomes. In addition, since some modifiable risk predictors are predictive of recurrence, chronicity, and disability, the models may be somewhat useful in the motivation of individuals with a recent onset of LBP to adjust their lifestyle and reduce the risk of developing the chronic form or recurrence and associated disability. This can subsequently reduce personal and societal costs associated with LBP.
There is a lack of prospective studies attempting to derive and validate LBP risk models , especially using routinely collected data. Previous studies have developed prediction models from occupation cohorts , among acute LBP patients in relation to developing chronic LBP (CLBP) , and based on pain trajectories , among others. These studies were comprised of few participants, fewer cases, considered ergonomics and occupation related variables, and did not incorporate routinely collected medical data. Here, we have developed and validated prediction equations and simplified risk scores to estimate future risk of LBP and its recurrence among apparently healthy individuals at baseline, in a large cohort using data from general medical practice. This makes our prediction equations more applicable to the general population, and able to distinguish individuals at risk in medical practice compared to these algorithms. In addition, the equations performed well in terms of discrimination and calibration. However, the derived equations cannot be a substitute for clinical expertise, but rather augment precision in clinical decision making. We believe that knowledge of personalized risk as well as the general health status of a patient with respect to LBP risk, as well as expert knowledge from clinicians will create a much more comprehensive picture than either one alone. The information for predictors in the derived equations can easily be obtained in clinical practice, and the points system is simple to use.
This study has the strengths of representativeness, duration of follow-up, adequate sample size, and lack of recall, respondent, and selection bias. The ICD-10-CM diagnostic codes in the Korean National Health Insurance database were evaluated and found to have good concordance with the actual health status of the individuals, based on medical charts and reports . This study is based on a wide range of risk predictors that can be applied in medical practice, and which individuals are likely to know. However, our study is limited because we did not incorporate psychosocial factors, genetics, and ergonomics related variables because these are not routinely collected in general practice. In addition, a low medical care-seeking behavior has been reported among LBP patients , with care-seeking more common in women, individuals with poor general health, and those with more disabling or more painful episodes . Therefore, it is possible that some individuals did not seek medical services for LBP, and therefore were possibly missed in some cases. Nevertheless, the LBP prevalence in this study was comparable with a previous study conducted in Korea . However, we used the same underlying population for model derivation and validation; thus, careful considerations are necessary in generalizing these results to other populations.
We have developed and validated risk prediction equations and simplified risk scores to estimate LBP risk in a nationwide sample cohort using data from general medical practice. The models showed good discrimination in identifying individuals at risk of developing LBP and its recurrence. To our knowledge, this study is the first nationwide cohort study that has attempted to derive and validate LBP risk prediction models using routinely collected health data. These models will improve individual decision-making, especially motivation for lifestyle modifications, guide physicians in practice, and define groups at high risk for LBP. We recommend further studies to validate and update these prediction models using cohorts from other populations and to incorporate other predictors in other settings.
No potential conflict of interest relevant to this article was reported.
No funding to declare.