Data Attribute Selection with Information Gain to Improve Credit Approval Classification Performance using K-Nearest Neighbor Algorithm
AbstractCredit is one of the modern economic behaviors. In practice, credit can be either borrowing a certain amount of money or purchasing goods with a gradual payment process and within an agreed timeframe. Economic conditions that are less supportive and high community needs make people choose to buy goods with this credit process. Unfortunately the high needs sometimes are not in line with the ability to make payments in accordance with the initial agreement. Such condition causes the payment process to be disrupted or also called the term “bad credit”. This research uses public data of credit card dataset from UCI repository and private data that is dataset of credit approval from local banking. The information gain algorithm is used to calculate the weights of each of the attributes. From the calculation results note that all attributes have different weights. This study resulted in the conclusion that not all data attributes influence the classification result. Suppose attribute A1 to UCI dataset as well as loan type attribute on local dataset that has information gain weight 0 (zero). The result of classification using K-Nearest Neighbors algorithm shows that there is an increase of 7.53% for UCI dataset and 3.26% for local dataset after feature selection on both datasets.
Alpaydin, E. (2010). Introduction to Machine Learning Second Edition. London: The MIT Press.
Amancio, D. R., Comin, C. H., Casanova, D., Travieso, G., Bruno, O. M., Rodrigues, F. a., & Costa, L. D. F. (2013). A systematic comparison of supervised classifiers. Retrieved from http://arxiv.org/abs/1311.0202v1.
Ashari, A., Paryudi, I., & Tjoa, A. M. (2013). Performance Comparison between Naïve Bayes , Decision Tree and k-Nearest Neighbor in Searching Alternative Design in an Energy Simulation Tool, 4(11), 33–39.
Azhagusundari, B., & Thanamani, A. S. (2013). Feature Selection based on Information Gain, (2), 18–21.
Han, J., & Kamber, M. (2006). Data Mining: Concepts and Techniques Second Edition. Elsevier. Elsevier.
Ian H Witten. Eibe Frank. Mark A Hall. (2011). Data Mining 3rd.
Karegowda, A. G., Manjunath, A. S., & Jayaram, M. A. (2010). Comparative Study of Attribute Selection using Gain Ratio and Correlation Based Feature Selection. International Journal of Information Technology and Knowledge Management, 2(2), 271–277.
Koprinska, I. (2010). Feature Selection for Brain-Computer Interfaces, 100–111.
Larose, D. T. (2005). Discovering Knowledge in Data: an Introduction to Data Mining. John Wiley & Sons.
Maimoon. (2010). Data Mining and Knowledge Discovery Handbook.
Maulana, M. R., & Al Karomi, M. A. (2016). Sistem Pendukung Keputusan Persetujuan Kredit Menggunakan Algoritma C4.5. Jurnal IC-Tech, Vol. XI No(1), 29–38. Retrieved from http://jurnal.stmik-wp.ac.id/gdl.php?mod=browse&op=read&id=ictech--muchrifqim-80.
Patel, K., Vala, J., & Pandya, J. (2014). Comparison of various classification algorithms on iris datasets using WEKA, 1(1), 1–7.
Prasetyo, E. (2012). Data Mining Konsep dan Aplikasi menggunakan Matlab. Yogyakarta: Andi Offset.
Ragab, A. H. M., Noaman, A. Y., Al-Ghamdi, A. S., & Madbouly, A. I. (2014). A Comparative Analysis of Classification Algorithms for Students College Enrollment Approval Using Data Mining. Proceedings of the 2014 Workshop on Interaction Design in Educational Environments - IDEE ’14, 106–113. https://doi.org/10.1145/2643604.2643631.
Santosa, B. (2007). Data Mining Teknik Pemanfaatan Data untuk Keperluan Bisnis (Edisi Pert). Yogyakarta: Graha Ilmu.
Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., … Steinberg, D. (2007). Top 10 algorithms in data mining. Knowledge and Information Systems (Vol. 14). https://doi.org/10.1007/s10115-007-0114-2.