전자간호기록을 이용한 한국형 응급환자 분류도구(KTAS) 3, 4단계 환자의 임상결과 예측 모델 개발
Other Titles
Development of a clinical outcome prediction model for emergency patients classified in level 3 and 4 of the Korean Triage and Acuity Scale Development of a clinical outcome prediction model for emergency patients classified in level 3 and 4 of the Korean Triage and Acuity Scale (KTAS) using electronic nursing records
Authors
신현아
College
Graduate School of Public Health (보건대학원)
Department
Dept. of Nursing (간호학과)
Degree
박사
Issue Date
2025-02
Abstract
국문 초록 응급실 환자의 임상 결과를 예측하는 것은 의료 자원의 효율적 배분과 신속한 의사결정을 지원하는 데 필수적이다. 특히, KTAS 3, 4 단계 환자는 임상 상태가 다변적이고 예측이 어려워 더욱 정교한 예측 도구가 필요하다. 본 연구는 HPM ExpertSignals 를 개념적 기틀로 하여 KTAS 3, 4 단계 성인 환자의 triage 정보와 간호 데이터를 통합하여 임상 결과 예측 모델을 개발하고 평가하였다. 본 연구는 2023 년 1 월 1 일부터 12 월 31 일까지 서울의 2,000 병상 규모 상급종합병원 응급실을 방문한 환자의 임상 데이터 웨어하우스에서 추출한 익명화된 데이터를 활용한 후향적 서술 조사 연구이다. 연구 대상은 KTAS 3, 4 단계 성인 환자이며, ICU 입원, 일반병동 입원, 전원, 퇴원, 사망 등 다섯 가지 임상 결과를 예측하는 모델을 개발하였다. 일반 triage 정보를 활용한 Model 1 과, triage 정보에 간호 데이터를 추가한 Model 2 를 각각 구축하였으며, 다항 로지스틱 회귀, 랜덤 포레스트, 그래디언트 부스트 알고리즘을 적용하였다. 모델 성능은 AUROC, AUPRC, 민감도, 특이도, F1 점수, 보정 곡선 등 다양한 평가 지표를 통해 비교하였다. 클래스 불균형 문제를 해결하기 위해 SMOTE 와 비용 민감학습을 적용하였고, 주요 예측 변수를 식별하고 모델의 해석 가능성을 높이기 위해 셰플리 가산 설명 기법(Shapley Additive Explanations, SHAP)을 사용하였다. 연구 결과, Model 2 는 모든 성능 지표에서 Model 1 보다 우수한 결과를 보였다. 특히, 랜덤 포레스트 알고리즘을 적용한 Model 2 는 AUROC 0.964 를 기록하며 가장 높은 예측 성능을 보였고, 최종 모델로 선정되었다. 주요 예측 변수로는 산소포화도 측정 빈도, 응급실 방문 경로, 과거 입원 이력이 도출되었으며, SHAP 분석을 통해 각 변수의 기여도를 시각화하여 모델의 해석 가능성을 강화하였다. SMOTE 와 비용 민감 학습은 소수 클래스인 ICU 입원과 사망의 예측 성능을 일부 개선하였으나, 클래스 불균형 문제는 여전히 한계로 남아 있었다. 본 연구는 triage 정보와 간호 데이터를 통합하여 KTAS 3, 4 단계 응급 환자의 임상 결과를 예측하는 모델의 가능성과 한계를 탐구하였다. 개발된 랜덤 포레스트 기반 모델은 높은 예측 성능과 해석 가능성을 바탕으로 임상 의사결정 지원 도구로 활용될 수 있는 잠재력을 보여주었다. 향후 연구에서는 다기관 데이터를 활용해 모델의 일반화 가능성을 검증하고, 클래스 불균형 문제를 해결하기 위한 새로운 접근법을 모색할 필요가 있다.
Introduction Emergency department (ED) overcrowding, worsened by the increasing number of critically ill patients, remains a significant challenge. Providing prompt and optimal medical services to all patients in overcrowded EDs is highly challenging. Therefore, early recognition and timely intervention for patients with life-threatening conditions are crucial (Lee et al., 2019). The process of triage refers to the decision-making procedure used to categorize the severity of a patient’s condition and allocate limited medical resources efficiently to provide appropriate emergency care within the "golden hour" for patients presenting with various acute conditions of differing severity (Patel et al., 2008). Triage is predominantly conducted by nurses in both domestic and international settings (Göransson et al., 2005; Park et al., 2014). In South Korea, the Korean Triage and Acuity Scale (KTAS), developed based on CTAS(Canadian Triage and Acuity Scale), has been the standard triage tool since 2016. KTAS classifies patients into five levels based on severity and urgency, with recommended intervention times ranging from immediate for Level 1 ("resuscitation") to within two hours for Level 5 ("non-urgent"). (Korean Society of Emergency Medicine KTAS Committee, 2021). Although patients in Levels 1 and 2 often receive prompt care, those in Levels 3 and below may experience delayed treatment if under-triaged, potentially leading to worsened outcomes. Conversely, over-triage can result in unnecessary consumption of human and material resources (Ekins & Morphet, 2015). According to 2022 statistics, KTAS Level 1 and 2 patients comprised 1.3% and 5.8% of ED visits, respectively, while Level 3 (“urgent”) and Level 4 (“semi-urgent”) patients accounted for 43.4% and 39.4%, representing over 80% of all visits (Emergency Medical Statistics Annual Report, 2023). This group, while crucial for the efficient utilization of medical resources, poses challenges for accurate prediction of clinical outcomes compared to higher acuity patients in Levels 1 and 2. Multiclass outcome modeling provides detailed predictions and improves resource allocation. While Riordan (2017) reported limited performance with a binary model (AUROC 0.730), Lee et al. (2020) demonstrated the effectiveness of a multiclass approach for predicting diverse clinical outcomes. Nursing records, particularly those within electronic medical record (EMR) systems, have demonstrated their value in predicting clinical deterioration and mortality (Collins et al., 2013). Nurses frequently document vital signs and unstructured clinical observations in response to patient condition changes, providing critical insights (Collins & Vawdrey, 2012). While numerous studies have developed predictive models for ED patient outcomes, most have focused on triage information or laboratory results, with limited attention to nursing documentation (Brink et al., 2022; Larburu et al., 2023). This study aims to address these gaps by developing and evaluating predictive models for clinical outcomes in KTAS Levels 3 and 4 patients, incorporating electronic nursing records alongside traditional triage data. This approach seeks to improve the accuracy of clinical outcome predictions, facilitate efficient resource allocation, and enhance the overall quality of emergency care. Furthermore, by leveraging nursing data as key predictors, this study contributes to the body of evidence supporting the clinical value and scientific foundation of nursing documentation in ED settings. Conceptual Framework This study utilizes a modified HPM-ExpertSignals framework to develop and evaluate predictive models for KTAS Level 3 and 4 ED patients, incorporating triage data and nursing records. The framework emphasizes the role of nursing assessments (e.g., vital signs, consciousness, KTAS level) and nursing record patterns (e.g., frequency of vital sign monitoring, interventions like notifying medical staff or documenting abnormalities) in reflecting patient conditions. Additionally, individual factors (e.g., age, prior hospitalizations) and environmental factors (e.g., arrival mode, waiting time) are integrated. By leveraging electronic nursing records, the study aims to enhance predictive accuracy, optimize ED resource allocation, and improve patient outcomes, highlighting the critical value of nursing documentation in clinical decision-making. Methods This study is a retrospective descriptive analysis aimed at investigating the characteristics and patterns of nursing records for KTAS Level 3 and 4 ED patients and their associations with clinical outcomes. The data for this study were collected from patients who visited the ED of a 2,000-bed tertiary hospital in Seoul, South Korea, between January 1 and December 31, 2023. The ED handles over 60,000 patient visits annually, and all data were extracted from the hospital’s Clinical Data Warehouse (CDW) system, DARWIN-C. The study included ED visits by patients aged 18 years or older with an initial KTAS level of 3 or 4, excluding visits with canceled registrations, non-treatment visits, patients who left against medical advice or without being seen, dead on arrival (DOA) cases, and those with missing data. Each ED visit was analyzed as a separate event. The potential predictors for clinical outcomes in KTAS Level 3 and 4 patients were identified through a comprehensive literature review and categorized according to the conceptual framework into triage information, nursing data patterns, individual adjustment factors, and environmental adjustment factors. Triage information included 11 variables collected during the nursing initial assessment, such as vital signs, level of consciousness, chief complaints, initial KTAS classification, pain status, and pain score. Nursing record patterns comprised 12 variables designed to capture clinical observation and intervention activities. Frequency patterns included the number of recorded instances for vital sign measurements, consciousness assessments, and pain evaluations. Intervention-related factors included documentation of physician notifications, abnormal test results, additional test requests, transfers to higher-acuity areas, and specific notes regarding events in the flowsheet. Individual adjustment factors included four variables: sex, age, history of hospital admissions within the past year, and ED visits within the past month. Environmental adjustment factors included six variables: time of ED arrival, day of the week, route of ED arrival, mode of arrival, waiting time for medical evaluation, and the time elapsed from symptom onset to ED arrival. The outcome variable was categorized into five clinical outcomes: general ward admission, ICU admission, transfer to another facility, death, discharge. Data sources for analysis variables were identified from the CDW. Nursing data were extracted from tables such as Nursing Initial Assessment – Emergency, Flowsheet Search Items, Patient Nursing Records, and Emergency Patient Location Change History. The findings from the investigation of data sources for each variable were compiled into a variable definition document. This document systematically outlined the data sources, extractable data fields, and associated codes for each variable. After finalizing the variable definition document and obtaining reviews from at least two experts, the document was further refined with input from CDW specialists. Based on the finalized definitions, the researcher directly extracted the necessary raw data from the CDW according to the defined variables. To ensure data reliability, preprocessing involved integrating and cleaning the dataset. Steps included outlier detection, handling missing data through imputation or exclusion, and addressing data imbalance using a combination of One-vs-Rest SMOTE (Synthetic Minority Over-sampling Technique) and cost-sensitive learning. As proposed in the conceptual framework, this study developed two models to evaluate the impact of nursing record pattern variables on model performance: one excluding nursing record pattern variables (Model 1) and one including them (Model 2). The performance of these two models was subsequently compared. To prevent overfitting and ensure model reliability, the dataset was randomly split into a training set (80%) and a test set (20%). The training set was further processed to address class imbalance using a combination of One-vs-Rest SMOTE and cost-sensitive learning. This study employed three machine learning algorithms to develop the predictive models: multinomial logistic regression, random forest, and Extreme Gradient Boosting (XGBoost). The models were trained using stratified 10-fold cross-validation. The performance of the developed models was evaluated through internal validation using the test set. After identifying the optimal model, variable importance was assessed to provide further insights. Data preprocessing, model development, and validation were conducted using Python version 3.10.12. Descriptive statistics were presented as means and standard deviations for continuous variables and frequencies with percentages for categorical variables. Relationships between predictor and outcome variables were analyzed using chi-squared tests for categorical variables and the Kruskal-Wallis test for non-normal continuous variables, with Dunn's test and Bonferroni correction applied for post-hoc analysis. Hyperparameter optimization was performed using GridSearchCV, and model performance was evaluated using metrics such as AUROC, AUPRC, accuracy, sensitivity, specificity, PPV, NPV, and F1 score. Initial variable importance was assessed using the Gini importance method in the Random Forest model, and SHAP (Shapley Additive Explanations) analysis provided a deeper evaluation, with a summary plot visually displaying the relative importance of variables across outcome classes. Calibration plots were used to assess the alignment between predicted probabilities and actual outcomes, ensuring model reliability and interpretability. Ethical Considerations This study was conducted with the approval of the Institutional Review Board (IRB) of Samsung Medical Center, with an exemption granted (IRB No. SMC 2024-07-063). Results Between January 1 and December 31, 2023, a total of 50,251 patients visited the ED, resulting in 71,000 ED visit records. Of these, 24,100 records (33.9%) were excluded based on the inclusion and exclusion criteria, leaving 46,900 records (66.1%) from 33,885 ED patients for the final analysis. Exclusions based on inclusion criteria included patients under 18 years of age (n = 10,395) and patients classified as KTAS Levels 1, 2, or 5 (n = 5,373). Exclusions based on exclusion criteria included canceled registrations (n = 6,132), non-treatment-related visits (n = 1,706), patients who left against medical advice (n = 408), patients who left without being seen (n = 24), and records with missing data (n = 62). The proportion of excluded data due to missing values was less than 0.1%. The eligible data were randomly allocated, with 80% assigned to the training set (n = 37,520) and 20% to the test set (n = 9,380). The training set was used for model development, while the test set was employed for internal validation of the model. Analysis of 46,900 ED visits showed that 67.5% of patients were discharged, 26.9% were admitted to a general ward, 1.7% to the ICU, 3.8% were transferred, and 0.1% resulted in death, with an overall hospital admission rate of 28.6%. Significant differences in clinical outcomes were observed: ICU and deceased patients were older, had abnormal vital signs, and higher ED visit rates. General ward admissions had the longest waiting times, while transfers reported the highest pain levels. Most ICU and deceased patients were KTAS Level 3, while discharged patients were evenly distributed between Levels 3 and 4. Severe outcomes were more common in males and ambulance arrivals. The analysis of nursing record patterns revealed significant differences across clinical outcomes. Vital signs, including BP, PR, RR, BT, and SpO₂, were recorded most frequently for deceased patients, while discharged patients had the lowest recording frequencies. Nursing interventions, such as physician notifications and abnormal test documentation, were also more frequent for deceased patients and ICU admissions. Treatment escalation occurred in 48.3% of deceased patients and 39.9% of general ward admissions, compared to 16.3% of discharged patients. Additionally, flowsheet comments were most frequent for deceased patients, emphasizing the importance of nursing documentation in managing severe clinical outcomes. This study evaluated two predictive models: Model 1, which included 21 predictors, and Model 2, which incorporated nursing record data for a total of 33 predictors. Using three machine learning algorithms (multinomial logistic regression, random forest, and gradient boosting), six models were developed and validated. Model 2 consistently outperformed Model 1 across all algorithms, highlighting the positive impact of including nursing data. For Model 1, random forest achieved the highest overall accuracy (72.4%), followed by gradient boosting (63.3%) and logistic regression (60.7%). However, minority classes such as ICU admissions and transfers showed poor predictive performance across all algorithms, with accuracies below 40%. Model 2 showed substantial improvement, with random forest achieving the highest overall accuracy (79.6%), followed by gradient boosting (74.0%) and logistic regression (68.8%). Notably, Model 2 demonstrated significant gains in minority class predictions, particularly with gradient boosting, which achieved the best performance for ICU admissions (39.6%) and transfers (37.6%). Random forest in Model 2 showed the highest AUROC (0.964) and AUPRC (0.888), with balanced sensitivity (0.796) and specificity (0.932). Gradient boosting also demonstrated enhanced performance in Model 2, achieving an AUROC of 0.940, AUPRC of 0.837, and an F1 score of 0.771. Logistic regression in Model 2 showed modest improvements, with an AUROC of 0.891 and an F1 score of 0.720. Visual analyses of AUROC and precision-recall curves highlighted consistent gains with Model 2. Discharge predictions achieved the highest average precision (0.960 in Model 2 vs. 0.905 in Model 1), while general ward admissions also improved significantly (0.713 in Model 2 vs. 0.601 in Model 1). Although ICU admissions and transfers continued to show lower precision, Model 2 demonstrated incremental improvements, underscoring the value of incorporating nursing data for clinical outcome prediction. Random forest Model 2, with the highest AUROC (0.964), AUPRC (0.888), sensitivity (0.796), and specificity (0.932), demonstrated superior performance across all metrics, making it the optimal predictive model. Variable importance analysis using Gini importance identified the most influential predictors in random forest Model 2. The most significant variable was the frequency of oxygen saturation measurements, followed by waiting time, pulse rate, time from symptom onset to arrival, systolic and diastolic blood pressure, age, body temperature, prior hospitalizations, and frequency of blood pressure measurements. SHAP analysis provided additional insights into the contribution of each variable to the model’s predictions. For ICU admissions, key predictors included prior hospitalizations, initial KTAS levels, and frequency of vital sign measurements, reflecting higher monitoring for severe cases. For general ward admissions, oxygen saturation measurement frequency and prior hospitalizations were most critical. Transfers and deaths were influenced by factors such as oxygen saturation, mode of arrival (e.g., ambulance use), and frequency of vital sign monitoring, indicating the importance of initial severity. Conversely, discharges were associated with lower oxygen saturation measurement frequency, arrival on foot, and less need for higher-acuity care. These findings highlight the nuanced role of nursing data and triage variables in predicting diverse clinical outcomes. The calibration plot for random forest Model 2 showed varying alignment with observed probabilities across classes. Discharge and general ward admissions were better calibrated, though discharges showed overestimation in the 0.2–0.8 range. ICU admissions, transfers, and mortality classes demonstrated poor calibration, particularly due to class imbalance and data scarcity, highlighting challenges in predicting minority outcomes. Discussion This study developed and validated predictive models for clinical outcomes of KTAS Level 3 and 4 adult patients using triage data and nursing records. Analysis of 46,900 ED visits revealed a high proportion of discharge cases (67.5%), followed by general ward admissions (26.9%), ICU admissions (1.7%), transfers (3.8%), and deaths (0.1%). The findings emphasized the heterogeneity of outcomes within the same KTAS levels, highlighting the limitations of KTAS in accurately predicting clinical outcomes and underscoring the need for advanced predictive models. Additionally, the study confirmed that abnormal vital signs, such as oxygen saturation levels and frequent monitoring, were prominent in severe cases, aligning with prior research on the significance of nursing observations in early detection of clinical deterioration. The inclusion of nursing data improved model performance across all algorithms, particularly in Model 2, which demonstrated superior accuracy, AUROC, and AUPRC compared to Model 1. The random forest algorithm in Model 2 achieved the best overall performance, with high sensitivity (0.796) and specificity (0.932), effectively identifying severe cases while minimizing unnecessary classifications. This result supports the value of incorporating nursing records, such as oxygen saturation measurement frequency, arrival mode, and prior hospitalizations, which provide critical clinical context beyond triage information alone. These findings align with previous studies showing that combining nursing data with physiological measures enhances model accuracy. SHAP analysis further validated the importance of key predictors, such as prior hospitalizations, initial KTAS levels, oxygen saturation monitoring, and arrival mode, in explaining model predictions. The analysis highlighted how these variables support clinical decision-making, particularly in ICU admission and mortality prediction. For discharge cases, predictors like low oxygen saturation monitoring frequency and arrival on foot indicated a higher likelihood of discharge, suggesting the potential of these models to optimize ED resource allocation. Overall, the study demonstrates the clinical value of integrating nursing data into predictive models, offering actionable insights to improve ED efficiency and patient outcomes. This study developed and evaluated predictive models for KTAS Level 3 and 4 emergency patients, but several limitations warrant attention in future research. First, reliance on single-institution data limits the external validity and generalizability of the results to diverse clinical settings, highlighting the need for multicenter validation. Second, class imbalance, particularly for minority outcomes such as deaths and ICU admissions, constrained prediction performance despite applying SMOTE and cost-sensitive learning. Expanding datasets and employing advanced augmentation techniques could address this issue. Third, the lack of standardized nursing documentation posed challenges for data quality and model reliability, underscoring the need for standardized systems. Fourth, the exclusion of nurse-specific factors, such as experience and education, may have reduced the model's explanatory power. Additionally, initial symptoms, a critical indicator of patient status, were not categorized in sufficient detail, limiting predictive precision. Lastly, overlapping variables across outcome classes, such as oxygen saturation measurement frequency, reduced immediate clinical applicability. Adopting rule-based approaches to simplify variable interactions could enhance model interpretability and utility in real-world applications. Conclusion This study developed and evaluated predictive models for KTAS level 3 and 4 emergency patients using triage information and nursing data. The model incorporating nursing data outperformed the triage-only model, with significant improvements in predicting outcomes such as discharge and general ward admission. Key variables, including oxygen saturation frequency and mode of arrival, enhanced clinical decision support, while SHAP analysis improved model interpretability and reliability. These findings highlight the potential of nursing data to optimize resource allocation and predict clinical outcomes, demonstrating the practical utility of integrating triage and nursing data in predictive modeling.