Machine learning prediction of incidence of Alzheimer&apos;s disease using large-scale administrative health data

Park, Ji Hwan; Cho, Han Eol; Kim, Jong Hun; Wall, Melanie M.; Stern, Yaakov; Lim, Hyunsun; Yoo, Shinjae; Kim, Hyoung Seop; Cha, Jiook

doi:10.1038/s41746-020-0256-0

YUHSpace

BROWSE

332 641

Cited 83 times in

Cited 115 times in

Machine learning prediction of incidence of Alzheimer's disease using large-scale administrative health data

DC Field	Value	Language
dc.contributor.author	Park, Ji Hwan	-
dc.contributor.author	Cho, Han Eol	-
dc.contributor.author	Kim, Jong Hun	-
dc.contributor.author	Wall, Melanie M.	-
dc.contributor.author	Stern, Yaakov	-
dc.contributor.author	Lim, Hyunsun	-
dc.contributor.author	Yoo, Shinjae	-
dc.contributor.author	Kim, Hyoung Seop	-
dc.contributor.author	Cha, Jiook	-
dc.date.accessioned	2020-12-01T17:03:10Z	-
dc.date.available	2020-12-01T17:03:10Z	-
dc.date.created	2021-03-19	-
dc.date.issued	2020-03	-
dc.identifier.issn	2398-6352	-
dc.identifier.uri	https://ir.ymlib.yonsei.ac.kr/handle/22282913/180100	-
dc.description.abstract	Nationwide population-based cohort provides a new opportunity to build an automated risk prediction model based on individuals' history of health and healthcare beyond existing risk prediction models. We tested the possibility of machine learning models to predict future incidence of Alzheimer's disease (AD) using large-scale administrative health data. From the Korean National Health Insurance Service database between 2002 and 2010, we obtained de-identified health data in elders above 65 years (N = 40,736) containing 4,894 unique clinical features including ICD-10 codes, medication codes, laboratory values, history of personal and family illness and socio-demographics. To define incident AD we considered two operational definitions: "definite AD" with diagnostic codes and dementia medication (n = 614) and "probable AD" with only diagnosis (n = 2026). We trained and validated random forest, support vector machine and logistic regression to predict incident AD in 1, 2, 3, and 4 subsequent years. For predicting future incidence of AD in balanced samples (bootstrapping), the machine learning models showed reasonable performance in 1-year prediction with AUC of 0.775 and 0.759, based on "definite AD" and "probable AD" outcomes, respectively; in 2-year, 0.730 and 0.693; in 3-year, 0.677 and 0.644; in 4-year, 0.725 and 0.683. The results were similar when the entire (unbalanced) samples were used. Important clinical features selected in logistic regression included hemoglobin level, age and urine protein level. This study may shed a light on the utility of the data-driven machine learning model based on large-scale administrative health data in AD risk prediction, which may enable better selection of individuals at risk for AD in clinical trials or early detection in clinical settings.	-
dc.description.statementOfResponsibility	open	-
dc.language	English	-
dc.publisher	Nature Publishing Group	-
dc.relation.isPartOf	NPJ DIGITAL MEDICINE(Nature partner journals digital medicine Digital medicine)	-
dc.rights	CC BY-NC-ND 2.0 KR	-
dc.title	Machine learning prediction of incidence of Alzheimer's disease using large-scale administrative health data	-
dc.type	Article	-
dc.contributor.college	College of Medicine (의과대학)	-
dc.contributor.department	Dept. of Rehabilitation Medicine (재활의학교실)	-
dc.contributor.googleauthor	Park, Ji Hwan	-
dc.contributor.googleauthor	Cho, Han Eol	-
dc.contributor.googleauthor	Kim, Jong Hun	-
dc.contributor.googleauthor	Wall, Melanie M.	-
dc.contributor.googleauthor	Stern, Yaakov	-
dc.contributor.googleauthor	Lim, Hyunsun	-
dc.contributor.googleauthor	Yoo, Shinjae	-
dc.contributor.googleauthor	Kim, Hyoung Seop	-
dc.contributor.googleauthor	Cha, Jiook	-
dc.identifier.doi	10.1038/s41746-020-0256-0	-
dc.relation.journalcode	J03796	-
dc.identifier.eissn	2398-6352	-
dc.subject.keyword	Alzheimer&apos	-
dc.subject.keyword	s disease	-
dc.subject.keyword	Predictive markers	-
dc.contributor.alternativeName	Cho, Han Eol	-
dc.contributor.affiliatedAuthor	Cho, Han Eol	-
dc.identifier.scopusid	2-s2.0-85088412971	-
dc.identifier.wosid	000522364200001	-
dc.citation.volume	3	-
dc.citation.number	1	-
dc.identifier.bibliographicCitation	NPJ DIGITAL MEDICINE(Nature partner journals digital medicine Digital medicine), Vol.3(1), 2020-03	-
dc.identifier.rimsid	70147	-
dc.type.rims	ART	-
dc.description.journalClass	1	-
dc.description.journalClass	1	-
dc.subject.keywordAuthor	Alzheimer&apos	-
dc.subject.keywordAuthor	s disease	-
dc.subject.keywordAuthor	Predictive markers	-
dc.subject.keywordPlus	DEMENTIA RISK	-
dc.subject.keywordPlus	COGNITIVE DEFICITS	-
dc.subject.keywordPlus	OLDER PERSONS	-
dc.subject.keywordPlus	POPULATION	-
dc.subject.keywordPlus	DYSFUNCTION	-
dc.subject.keywordPlus	MODELS	-
dc.subject.keywordPlus	ANEMIA	-
dc.subject.keywordPlus	SAMPLE	-
dc.subject.keywordPlus	COHORT	-
dc.type.docType	Article	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalWebOfScienceCategory	Health Care Sciences & Services	-
dc.relation.journalWebOfScienceCategory	Medical Informatics	-
dc.relation.journalResearchArea	Health Care Sciences & Services	-
dc.relation.journalResearchArea	Medical Informatics	-
dc.identifier.articleno	46	-

Appears in Collections:: 1. College of Medicine (의과대학) > Dept. of Rehabilitation Medicine (재활의학교실) > 1. Journal Papers

Show simple item record Find it @ YMLIB

License

YUHSpace: Machine learning prediction of incidence of Alzheimer's disease using large-scale administrative health data

YUHSpace

BROWSE

Browse

Links