Performance of Multimodal Generative AI Models in Addressing Complex Dental Inquiries With Text, Images, and Analytical Data

Mai, Hang-Nga; Lee, Du-Hyeong; Kaenploy, Jekita; Kim, Jong-Eun; Cho, Seok-Hwan

doi:10.1111/jerd.70064

YUHSpace

BROWSE

0 112

Cited 1 times in

Cited 0 times in

Performance of Multimodal Generative AI Models in Addressing Complex Dental Inquiries With Text, Images, and Analytical Data

DC Field	Value	Language
dc.contributor.author	Mai, Hang-Nga	-
dc.contributor.author	Lee, Du-Hyeong	-
dc.contributor.author	Kaenploy, Jekita	-
dc.contributor.author	Kim, Jong-Eun	-
dc.contributor.author	Cho, Seok-Hwan	-
dc.date.accessioned	2025-12-23T06:52:33Z	-
dc.date.available	2025-12-23T06:52:33Z	-
dc.date.created	2025-12-11	-
dc.date.issued	2025-11	-
dc.identifier.issn	1496-4155	-
dc.identifier.uri	https://ir.ymlib.yonsei.ac.kr/handle/22282913/209592	-
dc.description.abstract	Objective Multimodal large language models (LLMs) have the potential to transform dental learning and decision-making by addressing multimodal dental inquiries that integrate text, images, and analytical data. The purpose of this study was to evaluate the performance of various multimodal LLMs in responding to multimodal dental queries and to identify factors influencing their performance.Materials and Methods Four multimodal LLMs (ChatGPT-4V, Claude 3 Sonnet, Microsoft 365 Copilot 2024, and Google Gemini 1.5 Pro) were evaluated based on their correct answers and passing margin for the Integrated National Board Dental Examination (INBDE) and the Advanced Dental Admission Test (ADAT). Descriptive statistics, chi 2 tests, Cohen's kappa, Kruskal-Wallis tests, and Mann-Whitney U tests were used to analyze the performance across different question types, independent inputs, and picture types (alpha = 0.05).Results Claude 3 Sonnet outperformed the other models in both INBDE and ADAT exams, achieving the highest accuracy, followed by ChatGPT-4V, Microsoft 365 Copilot 2024, and Google Gemini 1.5 Pro. chi 2 tests revealed significant differences between chatbots in the ADAT exam, but not in the INBDE. Cohen's kappa showed weak to moderate model agreement for INBDE and stronger agreement for ADAT, with the highest agreement between Claude 3 Sonnet and ChatGPT-4V (kappa = 0.757) and the lowest between Google Gemini 1.5 Pro and Microsoft 365 Copilot 2024 (kappa = 0.059). Model performance was influenced by question type (theoretical and clinical), with common errors including misinterpreting clinical scenarios, visual data difficulties, and dental terminology ambiguities.Conclusion Multimodal LLMs show potential in answering multimodal dental inquiries, though performance varies across models, with challenges in interpreting clinical scenarios, visual data, and terminology ambiguity.Clinical Significance Large language models canbe applied not only to memorization-type but also interpretation andproblem-solving cognitive questions in dentistry. Tomaximize the utility of these artificial intelligence models, users need bothan understanding of their differences and the ability to manage complexclinical data.	-
dc.language	English	-
dc.publisher	Wiley-Blackwell	-
dc.relation.isPartOf	JOURNAL OF ESTHETIC AND RESTORATIVE DENTISTRY	-
dc.relation.isPartOf	JOURNAL OF ESTHETIC AND RESTORATIVE DENTISTRY	-
dc.title	Performance of Multimodal Generative AI Models in Addressing Complex Dental Inquiries With Text, Images, and Analytical Data	-
dc.type	Article	-
dc.contributor.googleauthor	Mai, Hang-Nga	-
dc.contributor.googleauthor	Lee, Du-Hyeong	-
dc.contributor.googleauthor	Kaenploy, Jekita	-
dc.contributor.googleauthor	Kim, Jong-Eun	-
dc.contributor.googleauthor	Cho, Seok-Hwan	-
dc.identifier.doi	10.1111/jerd.70064	-
dc.relation.journalcode	J04183	-
dc.identifier.eissn	1708-8240	-
dc.identifier.pmid	41287924	-
dc.subject.keyword	dental inquiry	-
dc.subject.keyword	exam	-
dc.subject.keyword	generative artificial intelligence	-
dc.subject.keyword	large language model	-
dc.subject.keyword	performance	-
dc.contributor.affiliatedAuthor	Kim, Jong-Eun	-
dc.identifier.scopusid	2-s2.0-105022833310	-
dc.identifier.wosid	001621978300001	-
dc.identifier.bibliographicCitation	JOURNAL OF ESTHETIC AND RESTORATIVE DENTISTRY, 2025-11	-
dc.identifier.rimsid	90229	-
dc.type.rims	ART	-
dc.description.journalClass	1	-
dc.description.journalClass	1	-
dc.subject.keywordAuthor	dental inquiry	-
dc.subject.keywordAuthor	exam	-
dc.subject.keywordAuthor	generative artificial intelligence	-
dc.subject.keywordAuthor	large language model	-
dc.subject.keywordAuthor	performance	-
dc.type.docType	Article; Early Access	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalWebOfScienceCategory	Dentistry, Oral Surgery & Medicine	-
dc.relation.journalResearchArea	Dentistry, Oral Surgery & Medicine	-

Appears in Collections:: 2. College of Dentistry (치과대학) > Dept. of Prosthodontics (보철과학교실) > 1. Journal Papers

Show simple item record Find it @ YMLIB

License

YUHSpace: Performance of Multimodal Generative AI Models in Addressing Complex Dental Inquiries With Text, Images, and Analytical Data

YUHSpace

BROWSE

Browse

Links