0 20

Cited 0 times in

Cited 0 times in

Leveraging multimodal large language model chatbots in oral radiology: a comprehensive evaluation using questions from a Korean dental university

Authors
 Jeong, Hui  ;  Jeon, Kug Jin  ;  Lee, Chena  ;  Choi, Yoon Joo  ;  Jo, Gyu-Dong  ;  Han, Sang-Sun 
Citation
 DENTOMAXILLOFACIAL RADIOLOGY, 2025-12 
Journal Title
DENTOMAXILLOFACIAL RADIOLOGY
ISSN
 0250-832X 
Issue Date
2025-12
Keywords
oral radiology ; multimodal large language model ; accuracy ; answer consistency ; hallucination
Abstract
Objectives This study aimed to conduct a comprehensive evaluation of general-purpose multimodal large language model (LLM) chatbots in oral radiology. Methods Ninety text- and image-based oral radiology questions from a Korean dental university were extracted and categorized into six educational contents and two question types. ChatGPT-4o and Gemini 2.0 Flash were evaluated with following items: accuracy with group differences across six contents (using Fisher's exact test with Bonferroni correction, P < .0167), answer consistency across ten repeated outputs (evaluated as the mean agreement and Fleiss' kappa coefficient), and hallucination (evaluated as the mean of the 5-point Global Quality Score assigned by two oral radiologists). Results Multimodal LLM chatbots (ChatGPT-4o and Gemini 2.0 Flash) achieved excellent performance on text-based questions with over 80% accuracy but showed limited performance on image-based tasks, with accuracy under 30%. Additionally, image-based tasks exhibited high response variability, and hallucinations were frequently observed, providing incorrect information. These findings suggest that AI chatbots are not yet suitable for reliable use in oral radiology. Conclusions This study provided timely insights into the capabilities and limitations of general-purpose multimodal LLM chatbots in the oral radiology, and will serve as a foundation for more safe and effective applications of AI chatbots in the oral radiology field in the future. Advances in knowledge This is the first study to comprehensively assess multimodal LLM chatbots in oral radiology. It provides key insights into the performance benchmarks for AI chatbots in oral radiology, promoting the responsible and transparent integration of AI into dental education.
Full Text
https://academic.oup.com/dmfr/advance-article/doi/10.1093/dmfr/twaf083/8378392
DOI
10.1093/dmfr/twaf083
Appears in Collections:
2. College of Dentistry (치과대학) > Dept. of Oral and Maxillofacial Radiology (영상치의학교실) > 1. Journal Papers
Yonsei Authors
Lee, Chena(이채나) ORCID logo https://orcid.org/0000-0002-8943-4192
Jeon, Kug Jin(전국진) ORCID logo https://orcid.org/0000-0002-5862-2975
Jo, Gyu-Dong(조규동)
Choi, Yoon Joo(최윤주) ORCID logo https://orcid.org/0000-0001-9225-3889
Han, Sang Sun(한상선) ORCID logo https://orcid.org/0000-0003-1775-7862
URI
https://ir.ymlib.yonsei.ac.kr/handle/22282913/210159
사서에게 알리기
  feedback

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse

Links