0 15

Cited 0 times in

Cited 0 times in

Key Measures for Evaluating Diagnostic Accuracy in Multi-Class Classification: An Overview and Simulation-Based Comparison

Authors
 Ryu, Leeha  ;  Han, Kyunghwa  ;  Jung, Inkyung  ;  Park, Yae Won  ;  Ahn, Sung Soo  ;  Hwang, Dosik 
Citation
 KOREAN JOURNAL OF RADIOLOGY, Vol.27(4) : 344-355, 2026-04 
Journal Title
KOREAN JOURNAL OF RADIOLOGY
ISSN
 1229-6929 
Issue Date
2026-04
MeSH
Artificial Intelligence* ; Brain Neoplasms* / diagnostic imaging ; Brain Neoplasms* / pathology ; Computer Simulation ; Glioma / diagnostic imaging ; Glioma / pathology ; Humans ; ROC Curve
Keywords
Multiclass classification ; Polytomous outcome prediction ; Accuracy ; Performance ; Metrics ; Measure ; Index
Abstract
Recent advancements in artificial intelligence have led to increased interest in predictive modeling across various domains, including medicine. Although numerous metrics have been established for binary classification, the growing adoption of multi-class systems necessitates robust evaluation measures. However, comprehensive simulation studies comparing the performance of existing multi-class metrics under diverse data conditions remain limited. In this study, we first provide a concise overview of commonly used accuracy metrics for multi-class classification. Then, we report a simulation study that systematically evaluates several diagnostic accuracy measures under a wide range of scenarios, including three-and five-class settings, balanced and imbalanced sample sizes, and different distributional assumptions for predictors. We assessed each metric's performance in terms of bias and 95% confidence interval coverage. Under balanced conditions, most metrics demonstrated stable and unbiased performance, closely approximating the true values. However, under imbalanced conditions, greater bias was observed, with the M-index and polytomous discrimination index exhibiting comparatively more stable performance across various scenarios. The micro-averaged receiver operating characteristic curve area consistently showed higher bias under class imbalance. Finally, we applied these metrics to a glioma tumor grading task using external datasets. This study provides a systematic comparison of commonly used metrics and offers practical guidance for selecting appropriate measures in multi-class classification tasks.
DOI
10.3348/kjr.2025.1447
Appears in Collections:
1. College of Medicine (의과대학) > Dept. of Radiology (영상의학교실) > 1. Journal Papers
1. College of Medicine (의과대학) > Dept. of Biomedical Systems Informatics (의생명시스템정보학교실) > 1. Journal Papers
Yonsei Authors
Park, Yae Won(박예원) ORCID logo https://orcid.org/0000-0001-8907-5401
Ahn, Sung Soo(안성수) ORCID logo https://orcid.org/0000-0002-0503-5558
Jung, Inkyung(정인경) ORCID logo https://orcid.org/0000-0003-3780-3213
Han, Kyung Hwa(한경화)
URI
https://ir.ymlib.yonsei.ac.kr/handle/22282913/211784
사서에게 알리기
  feedback

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse

Links