THE UNIVERSITY OF BRITISH COLUMBIA
Rationale: Automatic prediction algorithms based on routinely collected administrative health data may be able to identify patients at a high-risk for hospitalizations related to acute exacerbations of Chronic Obstructive Pulmonary Disease (COPD).
Methods: We used British Columbia’s administrative health databases (1997–2016) to identify patients with diagnosed COPD. The assessment window was the six-month period before, and the outcome window was the 60 days after, a randomly chosen date. The outcome of interest was a hospital admission with COPD as the most responsible diagnosis. We used Lasso penalized logistic regression for variable selection, and applied the machine-learning algorithms for risk prediction (logistic regression, random forest, neural network, and gradient boosting). To evaluate the performance of models, we created a temporal validation dataset based on a randomly chosen future date for alive patients at least one year later. We used calibration plots and a receiver operating characteristic (ROC) curve to evaluate model performance.
Measurements and Main Results: There were 108,433 patients in the training and 113,786 patients in the validation datasets; of these, 1,126 in the training and 1,136 in the validation datasets were hospitalized for COPD within the assessment window. The best prediction algorithm (gradient boosting) had an area under the ROC curve of 0.82 (95%CI 0.80–0.83). The predicted risk scores were well calibrated in the validation dataset.
Conclusion: Imminent COPD-related hospitalizations can be predicted with good accuracy using routinely collected administrative data. This model may be used as a means to target high-risk patients for preventive exacerbation therapies.