221 440

Cited 0 times in

Evaluation of the Privacy Risks of Personal Health Identifiers and Quasi-Identifiers in a Distributed Research Network: Development and Validation Study

 SeHee Oh  ;  MinDong Sung  ;  Yumie Rhee  ;  Namki Hong  ;  Yu Rang Park 
 JMIR MEDICAL INFORMATICS, Vol.9(5) : e24940, 2021-05 
Journal Title
Issue Date
Observational Medical Outcomes Partnership common data model ; distributed research network ; personal health identifier ; privacy risk quantification ; quasi-identifier
Background: Privacy should be protected in medical data that include patient information. A distributed research network (DRN) is one of the challenges in privacy protection and in the encouragement of multi-institutional clinical research. A DRN standardizes multi-institutional data into a common structure and terminology called a common data model (CDM), and it only shares analysis results. It is necessary to measure how a DRN protects patient information privacy even without sharing data in practice.

Objective: This study aimed to quantify the privacy risk of a DRN by comparing different deidentification levels focusing on personal health identifiers (PHIs) and quasi-identifiers (QIs).

Methods: We detected PHIs and QIs in an Observational Medical Outcomes Partnership (OMOP) CDM as threatening privacy, based on 18 Health Insurance Portability and Accountability Act of 1996 (HIPPA) identifiers and previous studies. To compare the privacy risk according to the different privacy policies, we generated limited and safe harbor data sets based on 16 PHIs and 12 QIs as threatening privacy from the Synthetic Public Use File 5 Percent (SynPUF5PCT) data set, which is a public data set of the OMOP CDM. With minimum cell size and equivalence class methods, we measured the privacy risk reduction with a trust differential gap obtained by comparing the two data sets. We also measured the gap in randomly sampled records from the two data sets to adjust the number of PHI or QI records.

Results: The gaps averaged 31.448% and 73.798% for PHIs and QIs, respectively, with a minimum cell size of one, which represents a unique record in a data set. Among PHIs, the national provider identifier had the highest gap of 71.236% (71.244% and 0.007% in the limited and safe harbor data sets, respectively). The maximum size of the equivalence class, which has the largest size of an indistinguishable set of records, averaged 771. In 1000 random samples of PHIs, Device_exposure_start_date had the highest gap of 33.730% (87.705% and 53.975% in the data sets). Among QIs, Death had the highest gap of 99.212% (99.997% and 0.784% in the data sets). In 1000, 10,000, and 100,000 random samples of QIs, Device_treatment had the highest gaps of 12.980% (99.980% and 87.000% in the data sets), 60.118% (99.831% and 39.713%), and 93.597% (98.805% and 5.207%), respectively, and in 1 million random samples, Death had the highest gap of 99.063% (99.998% and 0.934% in the data sets).

Conclusions: In this study, we verified and quantified the privacy risk of PHIs and QIs in the DRN. Although this study used limited PHIs and QIs for verification, the privacy limitations found in this study could be used as a quality measurement index for deidentification of multi-institutional collaboration research, thereby increasing DRN safety.
Files in This Item:
T202103021.pdf Download
Appears in Collections:
1. College of Medicine (의과대학) > Dept. of Biomedical Systems Informatics (의생명시스템정보학교실) > 1. Journal Papers
1. College of Medicine (의과대학) > Dept. of Internal Medicine (내과학교실) > 1. Journal Papers
Yonsei Authors
Park, Yu Rang(박유랑) ORCID logo https://orcid.org/0000-0002-4210-2094
Sung, MinDong(성민동) ORCID logo https://orcid.org/0000-0002-5217-8877
Rhee, Yumie(이유미) ORCID logo https://orcid.org/0000-0003-4227-5638
Hong, Nam Ki(홍남기) ORCID logo https://orcid.org/0000-0002-8246-1956
사서에게 알리기


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.