Radiomics machine learning study with a small sample size: Single random training-test set split may lead to unreliable results

Chansik An; Yae Won Park; Sung Soo Ahn; Kyunghwa Han; Hwiyoung Kim; Seung-Koo Lee

doi:10.1371/journal.pone.0256152

YUHSpace

BROWSE

299 682

Cited 0 times in

Cited 56 times in

Radiomics machine learning study with a small sample size: Single random training-test set split may lead to unreliable results

DC Field	Value	Language
dc.contributor.author	김휘영	-
dc.contributor.author	박예원	-
dc.contributor.author	안성수	-
dc.contributor.author	이승구	-
dc.contributor.author	한경화	-
dc.date.accessioned	2021-09-29T02:21:42Z	-
dc.date.available	2021-09-29T02:21:42Z	-
dc.date.issued	2021-08	-
dc.identifier.uri	https://ir.ymlib.yonsei.ac.kr/handle/22282913/184850	-
dc.description.abstract	This study aims to determine how randomly splitting a dataset into training and test sets affects the estimated performance of a machine learning model and its gap from the test performance under different conditions, using real-world brain tumor radiomics data. We conducted two classification tasks of different difficulty levels with magnetic resonance imaging (MRI) radiomics features: (1) "Simple" task, glioblastomas [n = 109] vs. brain metastasis [n = 58] and (2) "difficult" task, low- [n = 163] vs. high-grade [n = 95] meningiomas. Additionally, two undersampled datasets were created by randomly sampling 50% from these datasets. We performed random training-test set splitting for each dataset repeatedly to create 1,000 different training-test set pairs. For each dataset pair, the least absolute shrinkage and selection operator model was trained and evaluated using various validation methods in the training set, and tested in the test set, using the area under the curve (AUC) as an evaluation metric. The AUCs in training and testing varied among different training-test set pairs, especially with the undersampled datasets and the difficult task. The mean (±standard deviation) AUC difference between training and testing was 0.039 (±0.032) for the simple task without undersampling and 0.092 (±0.071) for the difficult task with undersampling. In a training-test set pair with the difficult task without undersampling, for example, the AUC was high in training but much lower in testing (0.882 and 0.667, respectively); in another dataset pair with the same task, however, the AUC was low in training but much higher in testing (0.709 and 0.911, respectively). When the AUC discrepancy between training and test, or generalization gap, was large, none of the validation methods helped sufficiently reduce the generalization gap. Our results suggest that machine learning after a single random training-test set split may lead to unreliable results in radiomics studies especially with small sample sizes.	-
dc.description.statementOfResponsibility	open	-
dc.format	application/pdf	-
dc.language	English	-
dc.publisher	Public Library of Science	-
dc.relation.isPartOf	PLOS ONE	-
dc.rights	CC BY-NC-ND 2.0 KR	-
dc.title	Radiomics machine learning study with a small sample size: Single random training-test set split may lead to unreliable results	-
dc.type	Article	-
dc.contributor.college	College of Medicine (의과대학)	-
dc.contributor.department	Dept. of Biomedical Systems Informatics (의생명시스템정보학교실)	-
dc.contributor.googleauthor	Chansik An	-
dc.contributor.googleauthor	Yae Won Park	-
dc.contributor.googleauthor	Sung Soo Ahn	-
dc.contributor.googleauthor	Kyunghwa Han	-
dc.contributor.googleauthor	Hwiyoung Kim	-
dc.contributor.googleauthor	Seung-Koo Lee	-
dc.identifier.doi	10.1371/journal.pone.0256152	-
dc.contributor.localId	A05971	-
dc.contributor.localId	A05330	-
dc.contributor.localId	A02234	-
dc.contributor.localId	A02912	-
dc.relation.journalcode	J02540	-
dc.identifier.eissn	1932-6203	-
dc.identifier.pmid	34383858	-
dc.contributor.alternativeName	Kim, Hwiyoung	-
dc.contributor.affiliatedAuthor	김휘영	-
dc.contributor.affiliatedAuthor	박예원	-
dc.contributor.affiliatedAuthor	안성수	-
dc.contributor.affiliatedAuthor	이승구	-
dc.citation.volume	16	-
dc.citation.number	8	-
dc.citation.startPage	e0256152	-
dc.identifier.bibliographicCitation	PLOS ONE, Vol.16(8) : e0256152, 2021-08	-

Appears in Collections:: 1. College of Medicine (의과대학) > Dept. of Neurosurgery (신경외과학교실) > 1. Journal Papers
1. College of Medicine (의과대학) > Dept. of Radiology (영상의학교실) > 1. Journal Papers

Show simple item record Find it @ YMLIB

License

YUHSpace: Radiomics machine learning study with a small sample size: Single random training-test set split may lead to unreliable results

YUHSpace

BROWSE

Browse

Links