110 372

Cited 1 times in

Local Differential Privacy in the Medical Domain to Protect Sensitive Information: Algorithm Development and Real-World Validation

DC Field Value Language
dc.contributor.author박유랑-
dc.contributor.author성민동-
dc.contributor.author차동철-
dc.date.accessioned2022-05-09T16:51:20Z-
dc.date.available2022-05-09T16:51:20Z-
dc.date.issued2021-11-
dc.identifier.urihttps://ir.ymlib.yonsei.ac.kr/handle/22282913/188245-
dc.description.abstractBackground: Privacy is of increasing interest in the present big data era, particularly the privacy of medical data. Specifically, differential privacy has emerged as the standard method for preservation of privacy during data analysis and publishing. Objective: Using machine learning techniques, we applied differential privacy to medical data with diverse parameters and checked the feasibility of our algorithms with synthetic data as well as the balance between data privacy and utility. Methods: All data were normalized to a range between -1 and 1, and the bounded Laplacian method was applied to prevent the generation of out-of-bound values after applying the differential privacy algorithm. To preserve the cardinality of the categorical variables, we performed postprocessing via discretization. The algorithm was evaluated using both synthetic and real-world data (from the eICU Collaborative Research Database). We evaluated the difference between the original data and the perturbated data using misclassification rates and the mean squared error for categorical data and continuous data, respectively. Further, we compared the performance of classification models that predict in-hospital mortality using real-world data. Results: The misclassification rate of categorical variables ranged between 0.49 and 0.85 when the value of ε was 0.1, and it converged to 0 as ε increased. When ε was between 102 and 103, the misclassification rate rapidly dropped to 0. Similarly, the mean squared error of the continuous variables decreased as ε increased. The performance of the model developed from perturbed data converged to that of the model developed from original data as ε increased. In particular, the accuracy of a random forest model developed from the original data was 0.801, and this value ranged from 0.757 to 0.81 when ε was 10-1 and 104, respectively. Conclusions: We applied local differential privacy to medical domain data, which are diverse and high dimensional. Higher noise may offer enhanced privacy, but it simultaneously hinders utility. We should choose an appropriate degree of noise for data perturbation to balance privacy and utility depending on specific situations.-
dc.description.statementOfResponsibilityopen-
dc.languageEnglish-
dc.publisherJMIR Publications-
dc.relation.isPartOfJMIR MEDICAL INFORMATICS-
dc.rightsCC BY-NC-ND 2.0 KR-
dc.titleLocal Differential Privacy in the Medical Domain to Protect Sensitive Information: Algorithm Development and Real-World Validation-
dc.typeArticle-
dc.contributor.collegeCollege of Medicine (의과대학)-
dc.contributor.departmentDept. of Biomedical Systems Informatics (의생명시스템정보학교실)-
dc.contributor.googleauthorMinDong Sung-
dc.contributor.googleauthorDongchul Cha-
dc.contributor.googleauthorYu Rang Park-
dc.identifier.doi10.2196/26914-
dc.contributor.localIdA05624-
dc.contributor.localIdA05923-
dc.contributor.localIdA05871-
dc.relation.journalcodeJ03664-
dc.identifier.eissn2291-9694-
dc.identifier.pmid34747711-
dc.subject.keywordalgorithm-
dc.subject.keywordbig data-
dc.subject.keyworddevelopment-
dc.subject.keyworddifferential privacy-
dc.subject.keywordelectronic health record-
dc.subject.keywordfeasibility-
dc.subject.keywordmachine learning-
dc.subject.keywordmedical data-
dc.subject.keywordmedical informatics-
dc.subject.keywordprivacy-
dc.subject.keywordprivacy-preserving-
dc.subject.keywordsynthetic data-
dc.subject.keywordvalidation-
dc.contributor.alternativeNamePark, Yu Rang-
dc.contributor.affiliatedAuthor박유랑-
dc.contributor.affiliatedAuthor성민동-
dc.contributor.affiliatedAuthor차동철-
dc.citation.volume9-
dc.citation.number11-
dc.citation.startPagee26914-
dc.identifier.bibliographicCitationJMIR MEDICAL INFORMATICS, Vol.9(11) : e26914, 2021-11-
Appears in Collections:
1. College of Medicine (의과대학) > Dept. of Biomedical Systems Informatics (의생명시스템정보학교실) > 1. Journal Papers
1. College of Medicine (의과대학) > Dept. of Internal Medicine (내과학교실) > 1. Journal Papers
1. College of Medicine (의과대학) > Dept. of Otorhinolaryngology (이비인후과학교실) > 1. Journal Papers

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.