Improving mortality prediction after radiotherapy with large language model structuring of large-scale unstructured electronic health records

Park, Sangjoon; Wee, Chan Woo; Choi, Seo Hee; Kim, Kyung Hwan; Chang, Jee Suk; Yoon, Hong In; Lee, Ik Jae; Kim, Yong Bae; Cho, Jaeho; Keum, Ki Chang; Lee, Chang Geol; Byun, Hwa Kyung; Koom, Woong Sub

doi:10.1016/j.radonc.2025.111052

YUHSpace

BROWSE

0 139

Cited 1 times in

Cited 0 times in

Improving mortality prediction after radiotherapy with large language model structuring of large-scale unstructured electronic health records

Authors: Park, Sangjoon ; Wee, Chan Woo ; Choi, Seo Hee ; Kim, Kyung Hwan ; Chang, Jee Suk ; Yoon, Hong In ; Lee, Ik Jae ; Kim, Yong Bae ; Cho, Jaeho ; Keum, Ki Chang ; Lee, Chang Geol ; Byun, Hwa Kyung ; Koom, Woong Sub

Citation: RADIOTHERAPY AND ONCOLOGY, Vol.211, 2025-10

Article Number: 111052

Journal Title: RADIOTHERAPY AND ONCOLOGY

ISSN: 0167-8140

Issue Date: 2025-10

MeSH: Aged ; Deep Learning ; Electronic Health Records* ; Female ; Humans ; Large Language Models ; Machine Learning ; Male ; Middle Aged ; Neoplasms* / mortality ; Neoplasms* / radiotherapy

Keywords: Large language models ; Electronic health records ; Data structurization ; Radiotherapy ; Survival prediction

Abstract: Background and purpose: Avoiding unnecessary radiotherapy (RT) in patients with limited life expectancy requires accurate selection. Traditional survival models based on structured data often lack precision. Large language models (LLMs) offer a novel approach to structuring unstructured electronic health record (EHR) data, potentially improving survival predictions by integrating comprehensive clinical information. Materials and methods: We analyzed structured and unstructured data from 34,276 RT-treated patients at Yonsei Cancer Center. An open-source LLM structured unstructured EHR data using single-shot learning. External validation included 852 patients from Yongin Severance Hospital. We compared the LLM's performance against a domain-specific medical LLM and a smaller variant. Survival prediction models using statistical, machine-learning, and deep-learning approaches incorporated both structured and LLM-structured data. Results: The open-source LLM structured unstructured EHR data with 87.5 % accuracy, outperforming the domain-specific medical LLM (35.8 %). Larger LLMs were more effective in structuring clinically relevant features, such as general condition and disease extent, which correlated with survival. Incorporating LLM-structured features improved the deep learning model's C-index from 0.737 to 0.820 (internal validation) and from 0.779 to 0.842 (external validation). Risk stratification was also enhanced, with clearer differentiation among low-, intermediate-, and high-risk groups (p < 0.001). Additionally, models became more interpretable, as key LLM-structured features aligned with statistically significant predictors traditionally identified from structured data. Conclusion: General-domain LLMs, despite not being fine-tuned for medical data, can effectively structure large-scale unstructured EHRs, significantly improving survival prediction accuracy and model interpretability. The RT-Surv framework highlights the potential of LLMs to enhance clinical decision-making and optimize RT treatment.

Full Text: https://www.sciencedirect.com/science/article/pii/S0167814025045566

DOI: 10.1016/j.radonc.2025.111052

Appears in Collections:: 1. College of Medicine (의과대학) > Dept. of Radiation Oncology (방사선종양학교실) > 1. Journal Papers

URI: https://ir.ymlib.yonsei.ac.kr/handle/22282913/207996

사서에게 알리기

Show full item record Find it @ YMLIB

License

YUHSpace: Improving mortality prediction after radiotherapy with large language model structuring of large-scale unstructured electronic health records

YUHSpace

BROWSE

Browse

Links