Breast cancer ; Recurrence ; Metastasis ; Microbiome ; Machine learning
Abstract
Recurrence and metastasis of breast cancer (RMBC) have a decisive impact on patient survival, necessitating reliable biomarkers for its early prediction. This study used machine learning to evaluate blood microbiome profiles as predictive biomarkers of RMBC. A retrospective predictive analysis was conducted on 288 participants, including 96 patients with breast cancer and 192 healthy controls. After 7 years of follow-up, patients were classified into disease-free survival (DFS, n = 88) and RMBC (n = 8) groups. Blood microbiome composition was analysed using 16S rRNA sequencing, followed by quality control. The Synthetic Minority Oversampling Technique (SMOTE) was employed to address class imbalance. Eleven machine learning models were trained using leave-one-out cross-validation (LOOCV) and k-fold cross-validation, and evaluated based on the area under the receiver operating characteristic curve (AUROC), recall, precision, F1-score, and Matthews correlation coefficient (MCC). Alpha diversity was significantly lower in DFS and RMBC groups than in the healthy control group (p < 0.05), while beta diversity analysis revealed distinct clustering. The random forest achieved an AUROC of 0.94, a recall of 0.81, an F1-score of 0.83, and an MCC of 0.88. Enterobacter, Bacteroides, Klebsiella, and Bifidobacterium were among the key microbial genera predicting RMBC in the top five models. Blood microbiome profiling shows potential as a non-invasive RMBC biomarker. Machine learning effectively distinguished RMBC, warranting further validation.