Purpose: To develop a hierarchical continual arthropathy classification model for multiple joints that can be updated continuously for large-scale studies of various anatomical structures.
Materials and methods: This study included a total of 1371 radiographs of knee, elbow, ankle, shoulder, and hip joints from three tertiary hospitals. For model development, 934 radiographs of the knee, elbow, ankle, and shoulder were gathered from Sinchon Severance Hospital between July 1 and December 31, 2022. For external validation, 125 hip radiographs were collected from Yongin Severance Hospital between January 1 and December 31, 2022, and 312 knee cases were gathered from Gangnam Severance Hospital between January 1 and June 31, 2023. The Hierarchical Dynamically Expandable Representation (Hi-DER) model was trained stepwise on four joints using five-fold cross-validation. Arthropathy classification was evaluated at three hierarchical levels: abnormal classification (L1), low-grade or high-grade classification (L2), and specific grade classification (L3). The model's performance was compared with the grading predictions of two other AI models and three radiologists. For model explainability, gradient-weighted class activation mapping (Grad-CAM) and progressive erasing plus progressive restoration (PEPPR) were employed.
Results: The model achieved a weighted average AUC of 0.994 (95% CI: 0.985, 0.999) for L1, 0.980 (95% CI: 0.958, 0.996) for L2, and 0.973 (95% CI: 0.943, 0.993) for L3. The model maintained an AUC above 0.800 with 70% of the input regions erased. During external validation on hip joints, the model demonstrated a weighted average AUC of 0.978 (95% CI: 0.952, 0.996) for L1, 0.977 (95% CI: 0.946, 0.996) for L2, and 0.971 (95% CI: 0.934, 0.996) for L3. For external knee data, the model yielded a weighted average AUC of 0.934 (95%: CI 0.904, 0.958), 0.929 (95% CI: 0.900, 0.954), and 0.857 (95% CI: 0.816, 0.894) for L1, L2, and L3, respectively.
Conclusion: The Hi-DER may enhance the efficiency of arthropathy diagnosis through accurate classification of arthropathy grades across multiple joints, potentially enabling early treatment.