Improving mortality prediction after radiotherapy with large language model structuring of large-scale unstructured electronic health records

Park, Sangjoon; Wee, Chan Woo; Choi, Seo Hee; Kim, Kyung Hwan; Chang, Jee Suk; Yoon, Hong In; Lee, Ik Jae; Kim, Yong Bae; Cho, Jaeho; Keum, Ki Chang; Lee, Chang Geol; Byun, Hwa Kyung; Koom, Woong Sub

doi:10.1016/j.radonc.2025.111052

YUHSpace

BROWSE

0 116

Cited 1 times in

Cited 0 times in

Improving mortality prediction after radiotherapy with large language model structuring of large-scale unstructured electronic health records

DC Field	Value	Language
dc.contributor.author	Park, Sangjoon	-
dc.contributor.author	Wee, Chan Woo	-
dc.contributor.author	Choi, Seo Hee	-
dc.contributor.author	Kim, Kyung Hwan	-
dc.contributor.author	Chang, Jee Suk	-
dc.contributor.author	Yoon, Hong In	-
dc.contributor.author	Lee, Ik Jae	-
dc.contributor.author	Kim, Yong Bae	-
dc.contributor.author	Cho, Jaeho	-
dc.contributor.author	Keum, Ki Chang	-
dc.contributor.author	Lee, Chang Geol	-
dc.contributor.author	Byun, Hwa Kyung	-
dc.contributor.author	Koom, Woong Sub	-
dc.contributor.author	김경환	-
dc.date.accessioned	2025-10-27T05:42:39Z	-
dc.date.available	2025-10-27T05:42:39Z	-
dc.date.created	2025-09-23	-
dc.date.issued	2025-10	-
dc.identifier.issn	0167-8140	-
dc.identifier.uri	https://ir.ymlib.yonsei.ac.kr/handle/22282913/207996	-
dc.description.abstract	Background and purpose: Avoiding unnecessary radiotherapy (RT) in patients with limited life expectancy requires accurate selection. Traditional survival models based on structured data often lack precision. Large language models (LLMs) offer a novel approach to structuring unstructured electronic health record (EHR) data, potentially improving survival predictions by integrating comprehensive clinical information. Materials and methods: We analyzed structured and unstructured data from 34,276 RT-treated patients at Yonsei Cancer Center. An open-source LLM structured unstructured EHR data using single-shot learning. External validation included 852 patients from Yongin Severance Hospital. We compared the LLM's performance against a domain-specific medical LLM and a smaller variant. Survival prediction models using statistical, machine-learning, and deep-learning approaches incorporated both structured and LLM-structured data. Results: The open-source LLM structured unstructured EHR data with 87.5 % accuracy, outperforming the domain-specific medical LLM (35.8 %). Larger LLMs were more effective in structuring clinically relevant features, such as general condition and disease extent, which correlated with survival. Incorporating LLM-structured features improved the deep learning model's C-index from 0.737 to 0.820 (internal validation) and from 0.779 to 0.842 (external validation). Risk stratification was also enhanced, with clearer differentiation among low-, intermediate-, and high-risk groups (p < 0.001). Additionally, models became more interpretable, as key LLM-structured features aligned with statistically significant predictors traditionally identified from structured data. Conclusion: General-domain LLMs, despite not being fine-tuned for medical data, can effectively structure large-scale unstructured EHRs, significantly improving survival prediction accuracy and model interpretability. The RT-Surv framework highlights the potential of LLMs to enhance clinical decision-making and optimize RT treatment.	-
dc.language	English	-
dc.publisher	Elsevier Scientific Publishers	-
dc.relation.isPartOf	RADIOTHERAPY AND ONCOLOGY	-
dc.relation.isPartOf	RADIOTHERAPY AND ONCOLOGY	-
dc.subject.MESH	Aged	-
dc.subject.MESH	Deep Learning	-
dc.subject.MESH	Electronic Health Records*	-
dc.subject.MESH	Female	-
dc.subject.MESH	Humans	-
dc.subject.MESH	Large Language Models	-
dc.subject.MESH	Machine Learning	-
dc.subject.MESH	Male	-
dc.subject.MESH	Middle Aged	-
dc.subject.MESH	Neoplasms* / mortality	-
dc.subject.MESH	Neoplasms* / radiotherapy	-
dc.title	Improving mortality prediction after radiotherapy with large language model structuring of large-scale unstructured electronic health records	-
dc.type	Article	-
dc.contributor.googleauthor	Park, Sangjoon	-
dc.contributor.googleauthor	Wee, Chan Woo	-
dc.contributor.googleauthor	Choi, Seo Hee	-
dc.contributor.googleauthor	Kim, Kyung Hwan	-
dc.contributor.googleauthor	Chang, Jee Suk	-
dc.contributor.googleauthor	Yoon, Hong In	-
dc.contributor.googleauthor	Lee, Ik Jae	-
dc.contributor.googleauthor	Kim, Yong Bae	-
dc.contributor.googleauthor	Cho, Jaeho	-
dc.contributor.googleauthor	Keum, Ki Chang	-
dc.contributor.googleauthor	Lee, Chang Geol	-
dc.contributor.googleauthor	Byun, Hwa Kyung	-
dc.contributor.googleauthor	Koom, Woong Sub	-
dc.identifier.doi	10.1016/j.radonc.2025.111052	-
dc.relation.journalcode	J02597	-
dc.identifier.eissn	1879-0887	-
dc.identifier.pmid	40692078	-
dc.identifier.url	https://www.sciencedirect.com/science/article/pii/S0167814025045566	-
dc.subject.keyword	Large language models	-
dc.subject.keyword	Electronic health records	-
dc.subject.keyword	Data structurization	-
dc.subject.keyword	Radiotherapy	-
dc.subject.keyword	Survival prediction	-
dc.contributor.affiliatedAuthor	Park, Sangjoon	-
dc.contributor.affiliatedAuthor	Wee, Chan Woo	-
dc.contributor.affiliatedAuthor	Choi, Seo Hee	-
dc.contributor.affiliatedAuthor	Kim, Kyung Hwan	-
dc.contributor.affiliatedAuthor	Chang, Jee Suk	-
dc.contributor.affiliatedAuthor	Yoon, Hong In	-
dc.contributor.affiliatedAuthor	Lee, Ik Jae	-
dc.contributor.affiliatedAuthor	Kim, Yong Bae	-
dc.contributor.affiliatedAuthor	Cho, Jaeho	-
dc.contributor.affiliatedAuthor	Keum, Ki Chang	-
dc.contributor.affiliatedAuthor	Lee, Chang Geol	-
dc.contributor.affiliatedAuthor	Byun, Hwa Kyung	-
dc.contributor.affiliatedAuthor	Koom, Woong Sub	-
dc.identifier.scopusid	2-s2.0-105011378138	-
dc.identifier.wosid	001542096400001	-
dc.citation.volume	211	-
dc.identifier.bibliographicCitation	RADIOTHERAPY AND ONCOLOGY, Vol.211, 2025-10	-
dc.identifier.rimsid	89626	-
dc.type.rims	ART	-
dc.description.journalClass	1	-
dc.description.journalClass	1	-
dc.subject.keywordAuthor	Large language models	-
dc.subject.keywordAuthor	Electronic health records	-
dc.subject.keywordAuthor	Data structurization	-
dc.subject.keywordAuthor	Radiotherapy	-
dc.subject.keywordAuthor	Survival prediction	-
dc.subject.keywordPlus	RADIATION	-
dc.type.docType	Article	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalWebOfScienceCategory	Oncology	-
dc.relation.journalWebOfScienceCategory	Radiology, Nuclear Medicine & Medical Imaging	-
dc.relation.journalResearchArea	Oncology	-
dc.relation.journalResearchArea	Radiology, Nuclear Medicine & Medical Imaging	-
dc.identifier.articleno	111052	-

Appears in Collections:: 1. College of Medicine (의과대학) > Dept. of Radiation Oncology (방사선종양학교실) > 1. Journal Papers

Show simple item record Find it @ YMLIB

License

YUHSpace: Improving mortality prediction after radiotherapy with large language model structuring of large-scale unstructured electronic health records

YUHSpace

BROWSE

Browse

Links