Cited 0 times in 
Cited 0 times in 
Evaluating diagnostic accuracy of large language models in neuroradiology cases using image inputs from JAMA neurology and JAMA clinical challenges
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Albaqshi, Ahmed | - |
| dc.contributor.author | Ko, Ji Su | - |
| dc.contributor.author | Suh, Chong Hyun | - |
| dc.contributor.author | Suh, Pae Sun | - |
| dc.contributor.author | Shim, Woo Hyun | - |
| dc.contributor.author | Heo, Hwon | - |
| dc.contributor.author | Woo, Chang-Yun | - |
| dc.contributor.author | Park, Hyungjun | - |
| dc.date.accessioned | 2026-01-20T02:39:39Z | - |
| dc.date.available | 2026-01-20T02:39:39Z | - |
| dc.date.created | 2026-01-14 | - |
| dc.date.issued | 2025-11 | - |
| dc.identifier.uri | https://ir.ymlib.yonsei.ac.kr/handle/22282913/210001 | - |
| dc.description.abstract | This study assesses the diagnostic performance of six LLMs -GPT-4v, GPT-4o, Gemini 1.5 Pro, Gemini 1.5 Flash, Claude 3.0, and Claude 3.5-on complex neurology cases from JAMA Neurology and JAMA, focusing on their image interpretation abilities. We selected 56 radiology cases from JAMA Neurology and JAMA (from May 2015 to April 2024), rephrasing the text and reshuffling multiple-choice answer. Each LLM processed four input types: original quiz with images, rephrased text with images, rephrased text only, and images only. Model performance was compared with three neuroradiologists, and consistency was assessed across five repetitions using Fleiss' kappa. In the image-only condition, LLMs answered six specific questions regarding modality, sequence, contrast, plane, anatomical, and pathologic locations, and their accuracy was evaluated. Claude 3.5 achieved the highest accuracy (80.4%) on original image and text inputs. The accuracy using the rephrased quiz text with image ranged from 62.5% (35/56) to 76.8% (43/56). The accuracy using the rephrased quiz text only ranged from 51.8% (29/56) to 76.8% (43/56). LLMs performed on par with first-year fellows (71.4% [40/56]) but surpassed junior faculty (51.8% [29/56]) and second-year fellows (48.2% [27/56]). All LLMs showed almost similar results across the five repetitions (0.860-1.000). In image-only tasks, LLM accuracy in identifying pathologic locations ranged from 21.5% (28/130) to 63.1% (82/130). LLMs exhibit strong diagnostic performance with clinical text, yet their ability to interpret complex radiologic images independently is limited. Further refinement in image analysis is essential for these models to integrate fully into radiologic workflows. | - |
| dc.language | English | - |
| dc.publisher | Nature Publishing Group | - |
| dc.relation.isPartOf | SCIENTIFIC REPORTS | - |
| dc.relation.isPartOf | SCIENTIFIC REPORTS | - |
| dc.subject.MESH | Humans | - |
| dc.subject.MESH | Jamaica | - |
| dc.subject.MESH | Language* | - |
| dc.subject.MESH | Large Language Models | - |
| dc.subject.MESH | Neuroimaging* / methods | - |
| dc.subject.MESH | Neurology* | - |
| dc.title | Evaluating diagnostic accuracy of large language models in neuroradiology cases using image inputs from JAMA neurology and JAMA clinical challenges | - |
| dc.type | Article | - |
| dc.contributor.googleauthor | Albaqshi, Ahmed | - |
| dc.contributor.googleauthor | Ko, Ji Su | - |
| dc.contributor.googleauthor | Suh, Chong Hyun | - |
| dc.contributor.googleauthor | Suh, Pae Sun | - |
| dc.contributor.googleauthor | Shim, Woo Hyun | - |
| dc.contributor.googleauthor | Heo, Hwon | - |
| dc.contributor.googleauthor | Woo, Chang-Yun | - |
| dc.contributor.googleauthor | Park, Hyungjun | - |
| dc.identifier.doi | 10.1038/s41598-025-06458-z | - |
| dc.relation.journalcode | J02646 | - |
| dc.identifier.eissn | 2045-2322 | - |
| dc.identifier.pmid | 41309648 | - |
| dc.subject.keyword | Artificial intelligence | - |
| dc.subject.keyword | Deep learning | - |
| dc.subject.keyword | Image interpretation | - |
| dc.subject.keyword | Computer-assisted | - |
| dc.subject.keyword | Neuroimaging | - |
| dc.contributor.affiliatedAuthor | Suh, Pae Sun | - |
| dc.identifier.scopusid | 2-s2.0-105023653878 | - |
| dc.identifier.wosid | 001630276300001 | - |
| dc.citation.volume | 15 | - |
| dc.citation.number | 1 | - |
| dc.identifier.bibliographicCitation | SCIENTIFIC REPORTS, Vol.15(1), 2025-11 | - |
| dc.identifier.rimsid | 90963 | - |
| dc.type.rims | ART | - |
| dc.description.journalClass | 1 | - |
| dc.description.journalClass | 1 | - |
| dc.subject.keywordAuthor | Artificial intelligence | - |
| dc.subject.keywordAuthor | Deep learning | - |
| dc.subject.keywordAuthor | Image interpretation | - |
| dc.subject.keywordAuthor | Computer-assisted | - |
| dc.subject.keywordAuthor | Neuroimaging | - |
| dc.type.docType | Article | - |
| dc.description.isOpenAccess | Y | - |
| dc.description.journalRegisteredClass | scie | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalWebOfScienceCategory | Multidisciplinary Sciences | - |
| dc.relation.journalResearchArea | Science & Technology - Other Topics | - |
| dc.identifier.articleno | 43027 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.