Purpose: To compare the diagnostic outcomes of different interpretation schemes simulated for interpreting screening mammography, adding AI-CAD vs. a second human reader to a single human reader, using a consecutive, screening study sample.
Materials and Methods: Between January 2018 and January 2019, 2,385 digital mammograms of 2,385 consecutive women (mean age: 50.0±9.5 years) were included. As single reading is routine in our practice, interpretation reports were used as data for single reading. To simulate double reading, a second reader independently reviewed the screening mammograms with access to the interpretation reports. To simulate single reading interpretation with AI-CAD, one of the first readers re-evaluated the mammography images with positive AI-CAD results. Ground truth in terms of cancer/benign or absence of abnormality was confirmed according to histopathologic diagnosis or at least 1 year of follow-up.
Results: Among the 2385 mammograms, 6 (0.3%) were cancers, 32 (1.3%) were biopsy-confirmed benign, and 2347 (98.4%) were negative examinations. Reader 1+AI-CAD had significantly higher recall rates compared to reader 1, 2.6% (95% confidence interval [95% CI]: 2.0-3.3) vs. 2.4% (95% CI: 1.7-3.0) (p=0.008), respectively, that was lower than reader 1+2, 3.1% (95% CI: 2.4-3.8) (P=0.010). Specificity and accuracy were significantly higher in reader 1 compared to both reader 1+2 and reader 1+AI-CAD (all p<0.05, respectively). Reader 1+AI-CAD had significantly higher specificity (97.6% vs. 97.1%) and accuracy (97.5% vs. 97.0%) compared to reader 1+2 (p=0.010), respectively. High proportion of falsepositive findings detected by AI-CAD were distortions, while calcifications were mostly the cause for false-positive findings detected by the readers.
Conclusion: Adding readers, either AI-CAD or human second readers, results in higher recalls with significantly lower specificity and accuracy compared to a single human reader. When comparing the effect of adding AI-CAD vs. human second reader, AI-CAD had significantly lower recall and higher specificity and accuracy compared to the scheme of two human readers.