Implementing Vertical Federated Learning Using Autoencoders: Practical Application, Generalizability, and Utility Study

Dongchul Cha; MinDong Sung; Yu-Rang Park

doi:10.2196/26598

YUHSpace

BROWSE

Cited 36 times in

Implementing Vertical Federated Learning Using Autoencoders: Practical Application, Generalizability, and Utility Study

DC Field	Value	Language
dc.contributor.author	박유랑	-
dc.contributor.author	성민동	-
dc.contributor.author	차동철	-
dc.date.accessioned	2021-09-29T01:35:38Z	-
dc.date.available	2021-09-29T01:35:38Z	-
dc.date.issued	2021-06	-
dc.identifier.uri	https://ir.ymlib.yonsei.ac.kr/handle/22282913/184445	-
dc.description.abstract	Background: Machine learning (ML) is now widely deployed in our everyday lives. Building robust ML models requires a massive amount of data for training. Traditional ML algorithms require training data centralization, which raises privacy and data governance issues. Federated learning (FL) is an approach to overcome this issue. We focused on applying FL on vertically partitioned data, in which an individual's record is scattered among different sites. Objective: The aim of this study was to perform FL on vertically partitioned data to achieve performance comparable to that of centralized models without exposing the raw data. Methods: We used three different datasets (Adult income, Schwannoma, and eICU datasets) and vertically divided each dataset into different pieces. Following the vertical division of data, overcomplete autoencoder-based model training was performed for each site. Following training, each site's data were transformed into latent data, which were aggregated for training. A tabular neural network model with categorical embedding was used for training. A centrally based model was used as a baseline model, which was compared to that of FL in terms of accuracy and area under the receiver operating characteristic curve (AUROC). Results: The autoencoder-based network successfully transformed the original data into latent representations with no domain knowledge applied. These altered data were different from the original data in terms of the feature space and data distributions, indicating appropriate data security. The loss of performance was minimal when using an overcomplete autoencoder; accuracy loss was 1.2%, 8.89%, and 1.23%, and AUROC loss was 1.1%, 0%, and 1.12% in the Adult income, Schwannoma, and eICU dataset, respectively. Conclusions: We proposed an autoencoder-based ML model for vertically incomplete data. Since our model is based on unsupervised learning, no domain-specific knowledge is required in individual sites. Under the circumstances where direct data sharing is not available, our approach may be a practical solution enabling both data protection and building a robust model.	-
dc.description.statementOfResponsibility	open	-
dc.format	application/pdf	-
dc.language	English	-
dc.publisher	JMIR Publications	-
dc.relation.isPartOf	JMIR MEDICAL INFORMATICS	-
dc.rights	CC BY-NC-ND 2.0 KR	-
dc.title	Implementing Vertical Federated Learning Using Autoencoders: Practical Application, Generalizability, and Utility Study	-
dc.type	Article	-
dc.contributor.college	College of Medicine (의과대학)	-
dc.contributor.department	Dept. of Biomedical Systems Informatics (의생명시스템정보학교실)	-
dc.contributor.googleauthor	Dongchul Cha	-
dc.contributor.googleauthor	MinDong Sung	-
dc.contributor.googleauthor	Yu-Rang Park	-
dc.identifier.doi	10.2196/26598	-
dc.contributor.localId	A05624	-
dc.contributor.localId	A05923	-
dc.relation.journalcode	J03664	-
dc.identifier.eissn	2291-9694	-
dc.identifier.pmid	34106083	-
dc.subject.keyword	coding	-
dc.subject.keyword	data	-
dc.subject.keyword	data sharing	-
dc.subject.keyword	dataset	-
dc.subject.keyword	federated learning	-
dc.subject.keyword	machine learning	-
dc.subject.keyword	model	-
dc.subject.keyword	performance	-
dc.subject.keyword	privacy	-
dc.subject.keyword	protection	-
dc.subject.keyword	security	-
dc.subject.keyword	training	-
dc.subject.keyword	unsupervised learning	-
dc.subject.keyword	vertically incomplete data	-
dc.contributor.alternativeName	Park, Yu Rang	-
dc.contributor.affiliatedAuthor	박유랑	-
dc.contributor.affiliatedAuthor	성민동	-
dc.citation.volume	9	-
dc.citation.number	6	-
dc.citation.startPage	e26598	-
dc.identifier.bibliographicCitation	JMIR MEDICAL INFORMATICS, Vol.9(6) : e26598, 2021-06	-

Appears in Collections:: 1. College of Medicine (의과대학) > Dept. of Biomedical Systems Informatics (의생명시스템정보학교실) > 1. Journal Papers
1. College of Medicine (의과대학) > Dept. of Internal Medicine (내과학교실) > 1. Journal Papers
1. College of Medicine (의과대학) > Dept. of Otorhinolaryngology (이비인후과학교실) > 1. Journal Papers

Show simple item record

License

YUHSpace: Implementing Vertical Federated Learning Using Autoencoders: Practical Application, Generalizability, and Utility Study

YUHSpace

BROWSE

Browse

Links