125 379

Cited 17 times in

A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data

DC Field Value Language
dc.contributor.author유승찬-
dc.date.accessioned2022-11-24T00:59:40Z-
dc.date.available2022-11-24T00:59:40Z-
dc.date.issued2021-11-
dc.identifier.issn0169-2607-
dc.identifier.urihttps://ir.ymlib.yonsei.ac.kr/handle/22282913/191117-
dc.description.abstractBackground and objective: As a response to the ongoing COVID-19 pandemic, several prediction models in the existing literature were rapidly developed, with the aim of providing evidence-based guidance. However, none of these COVID-19 prediction models have been found to be reliable. Models are commonly assessed to have a risk of bias, often due to insufficient reporting, use of non-representative data, and lack of large-scale external validation. In this paper, we present the Observational Health Data Sciences and Informatics (OHDSI) analytics pipeline for patient-level prediction modeling as a standardized approach for rapid yet reliable development and validation of prediction models. We demonstrate how our analytics pipeline and open-source software tools can be used to answer important prediction questions while limiting potential causes of bias (e.g., by validating phenotypes, specifying the target population, performing large-scale external validation, and publicly providing all analytical source code). Methods: We show step-by-step how to implement the analytics pipeline for the question: 'In patients hospitalized with COVID-19, what is the risk of death 0 to 30 days after hospitalization?'. We develop models using six different machine learning methods in a USA claims database containing over 20,000 COVID-19 hospitalizations and externally validate the models using data containing over 45,000 COVID-19 hospitalizations from South Korea, Spain, and the USA. Results: Our open-source software tools enabled us to efficiently go end-to-end from problem design to reliable Model Development and evaluation. When predicting death in patients hospitalized with COVID-19, AdaBoost, random forest, gradient boosting machine, and decision tree yielded similar or lower internal and external validation discrimination performance compared to L1-regularized logistic regression, whereas the MLP neural network consistently resulted in lower discrimination. L1-regularized logistic regression models were well calibrated. Conclusion: Our results show that following the OHDSI analytics pipeline for patient-level prediction modelling can enable the rapid development towards reliable prediction models. The OHDSI software tools and pipeline are open source and available to researchers from all around the world.-
dc.description.statementOfResponsibilityopen-
dc.languageEnglish-
dc.publisherElsevier Scientific Publishers-
dc.relation.isPartOfCOMPUTER METHODS AND PROGRAMS IN BIOMEDICINE-
dc.rightsCC BY-NC-ND 2.0 KR-
dc.subject.MESHCOVID-19*-
dc.subject.MESHHumans-
dc.subject.MESHLogistic Models-
dc.subject.MESHMachine Learning-
dc.subject.MESHPandemics*-
dc.subject.MESHSARS-CoV-2-
dc.titleA standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data-
dc.typeArticle-
dc.contributor.collegeCollege of Medicine (의과대학)-
dc.contributor.departmentDept. of Biomedical Systems Informatics (의생명시스템정보학교실)-
dc.contributor.googleauthorSara Khalid-
dc.contributor.googleauthorCynthia Yang-
dc.contributor.googleauthorClair Blacketer-
dc.contributor.googleauthorTalita Duarte-Salles-
dc.contributor.googleauthorSergio Fernández-Bertolín-
dc.contributor.googleauthorChungsoo Kim-
dc.contributor.googleauthorRae Woong Park-
dc.contributor.googleauthorJimyung Park-
dc.contributor.googleauthorMartijn J Schuemie-
dc.contributor.googleauthorAnthony G Sena-
dc.contributor.googleauthorMarc A Suchard-
dc.contributor.googleauthorSeng Chan You-
dc.contributor.googleauthorPeter R Rijnbeek-
dc.contributor.googleauthorJenna M Reps-
dc.identifier.doi10.1016/j.cmpb.2021.106394-
dc.contributor.localIdA02478-
dc.relation.journalcodeJ00637-
dc.identifier.eissn1872-7565-
dc.identifier.pmid34560604-
dc.subject.keywordCOVID-19-
dc.subject.keywordData harmonization-
dc.subject.keywordData quality control-
dc.subject.keywordDistributed data network-
dc.subject.keywordMachine learning-
dc.subject.keywordRisk prediction-
dc.contributor.alternativeNameYou, Seng Chan-
dc.contributor.affiliatedAuthor유승찬-
dc.citation.volume211-
dc.citation.startPage106394-
dc.identifier.bibliographicCitationCOMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, Vol.211 : 106394, 2021-11-
Appears in Collections:
1. College of Medicine (의과대학) > Dept. of Biomedical Systems Informatics (의생명시스템정보학교실) > 1. Journal Papers

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.