Adult ; Algorithms ; Artificial Intelligence* ; Breast Neoplasms* / diagnostic imaging ; Deep Learning ; Female ; Humans ; Mammography* / methods ; Middle Aged ; Radiographic Image Interpretation, Computer-Assisted / methods ; Reproducibility of Results ; Republic of Korea / epidemiology ; Retrospective Studies ; Sensitivity and Specificity* ; Time Factors ; United States
Keywords
Artificial Intelligence ; Breast ; Breast Cancer ; Computer-Aided Detection ; Computer-Aided Diagnosis (CAD) ; Digital Breast Tomosynthesis ; Screening ; Tomosynthesis
Abstract
Purpose To develop an artificial intelligence (AI) model for the diagnosis of breast cancer on digital breast tomosynthesis (DBT) images and to investigate whether it could improve diagnostic accuracy and reduce radiologist reading time. Materials and Methods A deep learning AI algorithm was developed and validated for DBT with retrospectively collected examinations (January 2010 to December 2021) from 14 institutions in the United States and South Korea. A multicenter reader study was performed to compare the performance of 15 radiologists (seven breast specialists, eight general radiologists) in interpreting DBT examinations in 258 women (mean age, 56 years ± 13.41 [SD]), including 65 cancer cases, with and without the use of AI. Area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and reading time were evaluated. Results The AUC for stand-alone AI performance was 0.93 (95% CI: 0.92, 0.94). With AI, radiologists' AUC improved from 0.90 (95% CI: 0.86, 0.93) to 0.92 (95% CI: 0.88, 0.96) (P = .003) in the reader study. AI showed higher specificity (89.64% [95% CI: 85.34%, 93.94%]) than radiologists (77.34% [95% CI: 75.82%, 78.87%]) (P < .001). When reading with AI, radiologists' sensitivity increased from 85.44% (95% CI: 83.22%, 87.65%) to 87.69% (95% CI: 85.63%, 89.75%) (P = .04), with no evidence of a difference in specificity. Reading time decreased from 54.41 seconds (95% CI: 52.56, 56.27) without AI to 48.52 seconds (95% CI: 46.79, 50.25) with AI (P < .001). Interreader agreement measured by Fleiss κ increased from 0.59 to 0.62. Conclusion The AI model showed better diagnostic accuracy than radiologists in breast cancer detection, as well as reduced reading times. The concurrent use of AI in DBT interpretation could improve both accuracy and efficiency.