5 93

Cited 0 times in

Cited 0 times in

Large Language Models for CAD-RADS 2.0 Extraction From Semi-Structured Coronary CT Angiography Reports: A Multi-Institutional Study

Authors
 Min, Dabin  ;  Jin, Kwang Nam  ;  Bang, SangHeum  ;  Kim, Moon Young  ;  Kim, Hack-Lyoung  ;  Jeong, Won Gi  ;  Lee, Hye-Jeong  ;  Beck, Kyongmin Sarah  ;  Hwang, Sung Ho  ;  Kim, Eun Young  ;  Park, Chang Min 
Citation
 KOREAN JOURNAL OF RADIOLOGY, Vol.26(9) : 817-831, 2025-09 
Journal Title
KOREAN JOURNAL OF RADIOLOGY
ISSN
 1229-6929 
Issue Date
2025-09
Keywords
Coronary CT angiography ; CAD-RADS 2.0 ; Information extraction ; Large language model ; Prompting strategy
Abstract
Objective: To evaluate the accuracy of large language models (LLMs) in extracting Coronary Artery Disease-Reporting and Data System (CAD-RADS) 2.0 components from coronary CT angiography (CCTA) reports, and assess the impact of prompting strategies. Materials and Methods: In this multi-institutional study, we collected 319 synthetic, semi-structured CCTA reports from six institutions to protect patient privacy while maintaining clinical relevance. The dataset included 150 reports from a primary institution (100 for instruction development and 50 for internal testing) and 169 reports from five external institutions for external testing. Board-certified radiologists established reference standards following the CAD-RADS 2.0 guidelines for all three components: stenosis severity, plaque burden, and modifiers. Six LLMs (GPT-4, GPT-4o, Claude-3.5-Sonnet, o1-mini, Gemini-1.5-Pro, and DeepSeek-R1-Distill-Qwen-14B) were evaluated using an optimized instruction with prompting strategies, including zero-shot or few-shot with or without chain-of-thought (CoT) prompting. The accuracy was assessed and compared using McNemar's test. Results: LLMs demonstrated robust accuracy across all CAD-RADS 2.0 components. Peak stenosis severity accuracies reached 0.980 (48/49, Claude-3.5-Sonnet and o1-mini) in internal testing and 0.946 (158/167, GPT-4o and o1-mini) in external testing. Plaque burden extraction showed exceptional accuracy, with multiple models achieving perfect accuracy (43/43) in internal testing and 0.993 (137/138, GPT-4o, and o1-mini) in external testing. Modifier detection demonstrated consistently high accuracy (>= 0.990) across most models. One open-source model, DeepSeek-R1-Distill-Qwen-14B, showed a relatively low accuracy for stenosis severity: 0.898 (44/49, internal) and 0.820 (137/167, external). CoT prompting significantly enhanced the accuracy of several models, with GPT-4 showing the most substantial improvements: stenosis severity accuracy increased by 0.192 (P < 0.001) and plaque burden accuracy by 0.152 (P < 0.001) in external testing. Conclusion: LLMs demonstrated high accuracy in automated extraction of CAD-RADS 2.0 components from semi-structured CCTA reports, particularly when used with CoT prompting.
Files in This Item:
89922.pdf Download
DOI
10.3348/kjr.2025.0293
Appears in Collections:
1. College of Medicine (의과대학) > Dept. of Radiology (영상의학교실) > 1. Journal Papers
Yonsei Authors
Lee, Hye Jeong(이혜정) ORCID logo https://orcid.org/0000-0003-4349-9174
URI
https://ir.ymlib.yonsei.ac.kr/handle/22282913/208053
사서에게 알리기
  feedback

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse

Links