62 166

Cited 0 times in

Optimal selection of resampling methods for imbalanced data with high complexity

Authors
 Annie Kim  ;  Inkyung Jung 
Citation
 PLOS ONE, Vol.18(7) : e0288540, 2023-07 
Journal Title
PLOS ONE
Issue Date
2023-07
MeSH
Algorithms* ; Computer Simulation ; Sample Size
Abstract
Class imbalance is a major problem in classification, wherein the decision boundary is easily biased toward the majority class. A data-level solution (resampling) is one possible solution to this problem. However, several studies have shown that resampling methods can deteriorate the classification performance. This is because of the overgeneralization problem, which occurs when samples produced by the oversampling technique that should be represented in the minority class domain are introduced into the majority-class domain. This study shows that the overgeneralization problem is aggravated in complex data settings and introduces two alternate approaches to mitigate it. The first approach involves incorporating a filtering method into oversampling. The second approach is to apply undersampling. The main objective of this study is to provide guidance on selecting optimal resampling methods in imbalanced and complex datasets to improve classification performance. Simulation studies and real data analyses were performed to compare the resampling results in various scenarios with different complexities, imbalances, and sample sizes. In the case of noncomplex datasets, undersampling was found to be optimal. However, in the case of complex datasets, applying a filtering method to delete misallocated examples was optimal. In conclusion, this study can aid researchers in selecting the optimal method for resampling complex datasets.

Copyright: © 2023 Kim, Jung. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Files in This Item:
T202304454.pdf Download
DOI
10.1371/journal.pone.0288540
Appears in Collections:
1. College of Medicine (의과대학) > Dept. of Biomedical Systems Informatics (의생명시스템정보학교실) > 1. Journal Papers
Yonsei Authors
Jung, Inkyung(정인경) ORCID logo https://orcid.org/0000-0003-3780-3213
URI
https://ir.ymlib.yonsei.ac.kr/handle/22282913/196046
사서에게 알리기
  feedback

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse

Links