Random Forests ; screening test ; prediction model of cataracts ; accuracy ; sensitivity
Abstract
Cataract is the main cause of blindness and visual impairment, especially, age-related cataract accounts for about half of the 32 million cases of blindness worldwide. As the life expectancy and the expansion of the elderly population are increasing, the cases of cataract increase as well, which causes a serious economic and social problem throughout the country. However, the incidence of cataract can be reduced dramatically through early diagnosis and prevention. In this study, we developed a prediction model of cataracts for early diagnosis using hospital data of 3,237 subjects who received the screening test first and then later visited medical center for cataract check-ups cataract between 1994 and 2005. To develop the prediction model, we used random forests and compared the predictive performance of this model with other common discriminant models such as logistic regression, discriminant model, decision tree, naive Bayes, and two popular ensemble model, bagging and arcing. The accuracy of random forests was 67.16%, sensitivity was 72.28%, and main factors included in this model were age, diabetes, WBC, platelet, triglyceride, BMI and so on. The results showed that it could predict about 70% of cataract existence by screening test without any information from direct eye examination by ophthalmologist. We expect that our model may contribute to diagnose cataract and help preventing cataract in early stages.