0 348

Cited 0 times in

Modified k-NN approaches for multi class classification

Authors
 이현영 
Issue Date
2015
Description
의과대학/박사
Abstract
Multi class classification has several problems which are difficult to isolate, that reduce performance; many researchers have tried to address these issues. On the side of informatization speed, high dimensionality and applicability to data, a non-parametric approach is more suitable for multi class classifications. In this study, the k nearest neighbor (k-NN) learning algorithm was used, and we tried to further improve k-NN performance in the case of problems with a higher tie probability, small data size and inequality of class distribution. Furthermore, we attempted to clarify disease susceptibility with multi-labeling. Therefore, we suggest that the weighted similarity, which considers a predictor’s strength (PS) with mutual information according to the relationship of the true class with predictors, and the distance-weighted voting system, which is considered an individual distance (ID) among k nearest sets, together allow for a distance ratio. Regarding disease susceptibility, we introduce a pending region for multiple labelling. Gower’s distance was applied to k-NN. The proposed methods were compared with support vector machine (SVM) and linear discrimination analysis (LDA).

Sixty-four simulation sets were constructed with several problems such as sample size, combinations of coefficients, correlation strengths, inequality of class distribution and number of predictors. The CREDOS study data set was used and evaluated for a pending region to clarify disease susceptibility. The proposed methods i.e., PS and ID, improved k-NN ability and obtained better results than SVM and LDA did. Furthermore, ID markedly reduced of the probability of tie instances, reducing the gap between accuracy and recall. In the CREDOS (Clinical Research Center for Dementia of South Korea) study data set, k-NN with PS+ID outperformed SVM and LDA. With the pending regions as 0.0%, 2.5%, 5.0%, 7.5% and 10.0%, recall showed marked elevation, which did not exceed 0.40. As results, we obtained five labeling sets, namely AD, MCI+AD, MCI, SMI+(MCI or AD) and SMI, that were reflected in disease susceptibility of AD. The disease susceptibility showed significant association with true disease and other clinical assessments that were not included classification model.

The modified k-NN i.e., weighted similarity and distance-weighted voting system, can improve multi class classification ability, and it showed comparable results than LDA and SVM. Introducing pending regions may help in detecting disease susceptibility and may offer clue to solving disease progression.
Appears in Collections:
1. College of Medicine (의과대학) > Others (기타) > 3. Dissertation
URI
https://ir.ymlib.yonsei.ac.kr/handle/22282913/148736
사서에게 알리기
  feedback

qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse

Links