Deep learning model for diagnosing gastric mucosal lesions using endoscopic images: development, validation, and method comparison
Authors
Joon Yeul Nam ; Hyung Jin Chung ; Kyu Sung Choi ; Hyuk Lee ; Tae Jun Kim ; Hosim Soh ; Eun Ae Kang ; Soo-Jeong Cho ; Jong Chul Ye ; Jong Pil Im ; Sang Gyun Kim ; Joo Sung Kim ; Hyunsoo Chung ; Jeong-Hoon Lee
Area Under Curve ; Artificial Intelligence* ; Deep Learning* ; Humans ; Neural Networks, Computer ; ROC Curve
Abstract
Background and aims: Endoscopic differential diagnoses of gastric mucosal lesions (benign gastric ulcer, early gastric cancer [EGC], and advanced gastric cancer) remain challenging. We aimed to develop and validate convolutional neural network-based artificial intelligence (AI) models: lesion detection, differential diagnosis (AI-DDx), and invasion depth (AI-ID; pT1a vs pT1b among EGC) models.
Methods: This study included 1366 consecutive patients with gastric mucosal lesions from 2 referral centers in Korea. One representative endoscopic image from each patient was used. Histologic diagnoses were set as the criterion standard. Performance of the AI-DDx (training/internal/external validation set, 1009/112/245) and AI-ID (training/internal/external validation set, 620/68/155) was compared with visual diagnoses by independent endoscopists (stratified by novice [<1 year of experience], intermediate [2-3 years of experience], and expert [>5 years of experience]) and EUS results, respectively.
Results: The AI-DDx showed good diagnostic performance for both internal (area under the receiver operating characteristic curve [AUROC] = .86) and external validation (AUROC = .86). The performance of the AI-DDx was better than that of novice (AUROC = .82, P = .01) and intermediate endoscopists (AUROC = .84, P = .02) but was comparable with experts (AUROC = .89, P = .12) in the external validation set. The AI-ID showed a fair performance in both internal (AUROC = .78) and external validation sets (AUROC = .73), which were significantly better than EUS results performed by experts (internal validation, AUROC = .62; external validation, AUROC = .56; both P < .001).
Conclusions: The AI-DDx was comparable with experts and outperformed novice and intermediate endoscopists for the differential diagnosis of gastric mucosal lesions. The AI-ID performed better than EUS for evaluation of invasion depth.