Cited 13 times in
Machine learning model for diagnostic method prediction in parasitic disease using clinical information
DC Field | Value | Language |
---|---|---|
dc.date.accessioned | 2023-02-10T00:49:07Z | - |
dc.date.available | 2023-02-10T00:49:07Z | - |
dc.date.issued | 2021-12 | - |
dc.identifier.issn | 0957-4174 | - |
dc.identifier.uri | https://ir.ymlib.yonsei.ac.kr/handle/22282913/192392 | - |
dc.description.abstract | Diagnosing a parasitic disease is a very difficult job in clinical practice. In this study, we constructed a machine learning model for diagnosis prediction using patient information. First, we diagnosed whether a patient has a parasitic disease. Next, we predicted the proper diagnosis method among the six types of diagnostic terms (biopsy, endoscopy, microscopy, molecular, radiology, and serology) if the patient has a parasitic disease. To make the datasets, we extracted patient information from PubMed abstracts from 1956 to 2019. We then used two datasets: the prediction for parasite-infected patient dataset (N = 8748) and the prediction for diagnosis method dataset (N = 3780). We then compared four machine learning models: support vector machine, random forest, multi-layered perceptron, and gradient boosting. To solve the data imbalance problem, the synthetic minority over-sampling technique and TomekLinks were used. In the parasite-infected patient dataset, the random forest, random forest with synthetic minority over-sampling technique, gradient boosting, gradient boosting with synthetic minority over-sampling technique, and gradient boosting with TomekLinks demonstrated the best performances (AUC: 79%). In predicting the diagnosis method dataset, gradient boosting with synthetic minority over-sampling technique was the best model (AUC: 87%). For the class prediction, gradient boosting demonstrated the best performances in biopsy (AUC: 88%). In endoscopy (AUC: 94%), molecular (AUC: 90%), and radiology (AUC: 88%), gradient boosting with synthetic minority over-sampling technique demonstrated the best performance. Random forest demonstrated the best performances in microscopy (AUC: 82%) and serology (AUC: 85%). We calculated feature importance using gradient boosting; age was the highest feature importance. In conclusion, this study demonstrated that gradient boosting with synthetic minority over-sampling technique can predict a parasitic disease and serve as a promising diagnosis tool for binary classification and multi-classification schemes. | - |
dc.description.statementOfResponsibility | restriction | - |
dc.language | English | - |
dc.publisher | Pergamon | - |
dc.relation.isPartOf | EXPERT SYSTEMS WITH APPLICATIONS | - |
dc.rights | CC BY-NC-ND 2.0 KR | - |
dc.title | Machine learning model for diagnostic method prediction in parasitic disease using clinical information | - |
dc.type | Article | - |
dc.contributor.college | College of Medicine (의과대학) | - |
dc.contributor.department | Dept. of Pharmacology (약리학교실) | - |
dc.contributor.googleauthor | You Won Lee | - |
dc.contributor.googleauthor | Jae Woo Choi | - |
dc.contributor.googleauthor | Eun-Hee Shin | - |
dc.identifier.doi | 10.1016/j.eswa.2021.115658 | - |
dc.relation.journalcode | J00885 | - |
dc.identifier.url | https://www.sciencedirect.com/science/article/pii/S0957417421010496 | - |
dc.subject.keyword | Machine learning | - |
dc.subject.keyword | Parasite | - |
dc.subject.keyword | Diagnosis | - |
dc.subject.keyword | Multi-classification | - |
dc.subject.keyword | Binary-classification | - |
dc.citation.volume | 185 | - |
dc.citation.startPage | 115658 | - |
dc.identifier.bibliographicCitation | EXPERT SYSTEMS WITH APPLICATIONS, Vol.185 : 115658, 2021-12 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.