ANALYZING THE IMPACT OF FEATURE SELECTION USING INFORMATION GAIN FOR AIRLINES' CUSTOMER SATISFACTION
DOI:
https://doi.org/10.24191/mjoc.v9i1.24163Keywords:
Airline Customer Satisfaction, J48, Naïve Bayes, Feature Selection, Information GainAbstract
Feature selection has become a focus of research in many fields that deal with machine learning and data mining because it makes classifiers cost-effective, faster, and more accurate. In this paper, the impact of feature selection using filter methods such as Information Gain is shown. The impact of feature selection has been analyzed based on the accuracy of two classifiers: J48 and Naïve Bayes. The Airline Customer Satisfaction datasets have been used for comparing with and without applying Information Gain. As a result, J48 achieved 0.33% and 0.29% improvements in accuracy after applying Information Gain for 10-fold and 20-fold cross-validation, respectively compared to Naïve Bayes. Most of the precision and F1-score for J48 with Information Gain have also improved for both evaluation methods compared to Naïve Bayes. In conclusion, J48 seems to be the classifier that is most sensitive to feature selection and has shown improvements compared to Naïve Bayes.
References
Al-Qahtani, R. (2021). Predict sentiment of airline tweets using ML models. (No. 5228). EasyChair.
Amalia, S., Deborah, I., & Yulita, I. N. (2022). Comparative analysis of classification algorithm: Random Forest, SPAARC, and MLP for airlines customer satisfaction. Sinergi, 26(2), 213-222.
Amra, I. A. A., & Maghari, A. Y. (2017). Students performance prediction using KNN and Naïve Bayesian. 2017 8th International Conference on Information Technology (ICIT),
Bellizzi, M. G., Eboli, L., Mazzulla, G., & Postorino, M. N. (2022). Classification trees for analysing highly educated people satisfaction with airlines’ services. Transport Policy, 116, 199-211.
Berka, P., & Bruha, I. (1998). Discretization and grouping: Preprocessing steps for data mining. European symposium on principles of data mining and knowledge discovery,
Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16-28.
Chaudhury, P. R., Harshadeep, B., Yadav, M. K., & Kolluru, S. (2019). US - Falcon Airline Passenger Satisfaction. Institute of Management (Global Mindset - Indian Robots).
Chen, S., Webb, G. I., Liu, L., & Ma, X. (2020). A novel selective naïve Bayes algorithm. KnowledgeBased Systems, 192, 105361.
Cooper, W. S. (1991). Some inconsistencies and misnomers in probabilistic information retrieval. Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval,
Dash, C. S. K., Behera, A. K., Dehuri, S., & Ghosh, A. (2023). An outliers detection and elimination framework in classification task of data mining. Decision Analytics Journal, 6, 100164.
Guyon, I., Gunn, S., Nikravesh, M., & Zadeh, L. A. (2008). Feature extraction: foundations and applications (Vol. 207). Springer.
Hartmann, M. J., & Carleo, G. (2019). Neural-network approach to dissipative quantum many-body dynamics. Physical review letters, 122(25), 250502.
Hayadi, B. H., Kim, J.-M., Hulliyah, K., & Sukmana, H. T. (2021). Predicting Airline Passenger Satisfaction with Classification Algorithms. International Journal of Informatics and Information Systems, 4(1), 82-94.
Huang, F. (2021). Network Activities Recognition and Analysis Based on Supervised Machine Learning Classification Methods Using J48 and Naïve Bayes Algorithm. arXiv e-prints, arXiv: 2105.13698.
Hui, S. H., Khai, W. K., XinYing, C., & Wai, P. W. (2023). Prediction of customer churn for ABC Multistate Bank using machine learning algorithms/Hui Shan Hon...[et al.]. Malaysian Journal of Computing (MJoC), 8(2), 1602-1619.
Jiang, L., Zhang, L., Yu, L., & Wang, D. (2019). Class-specific attribute weighted naive Bayes. Pattern recognition, 88, 321-330.
Jiang, X., Zhang, Y., Li, Y., & Zhang, B. (2022). Forecast and analysis of aircraft passenger satisfaction based on RF-RFE-LR model. Scientific Reports, 12(1), 11174.
Juba, B., & Le, H. S. (2019). Precision-recall versus accuracy and the role of large data sets. Proceedings of the AAAI conference on artificial intelligence, Kaggle. (2020). Kaggle.com. https://www.kaggle.com/teejmahal20/airline-passenger-satisfaction.
Kaur, H., & Kumari, V. (2020). Predictive modelling and analytics for diabetes using a machine learning approach. Applied computing and informatics, 18(1/2), 90-100.
Keerthy, A., & Mathew, H. S. (2022). Feature Analysis on Airline Passenger Satisfaction using Orange Tool. 2022 6th International Conference on Electronics, Communication and Aerospace Technology,
Korting, T. S. (2006). C4. 5 algorithm and multivariate decision trees. Image Processing Division, National Institute for Space Research–INPE Sao Jose dos Campos–SP, Brazil, 22.
Kumar, S., & Zymbler, M. (2019). A machine learning approach to analyze customer satisfaction from airline tweets. Journal of Big Data, 6(1), 1-16.
Kurniabudi, K., Harris, A., & Mintaria, A. E. (2021). Komparasi Information Gain, Gain Ratio, CFsBestfirst dan CFs-PSO Search Terhadap Performa Deteksi Anomali. JURNAL MEDIA INFORMATIKA BUDIDARMA, 5(1), 332-343.
Lee, C., & Lee, G. G. (2006). Information gain and divergence-based feature selection for machine learning-based text categorization. Informationprocessing & management, 42(1), 155-165.
Leon, S., & Martín, J. C. (2020). A fuzzy segmentation analysis of airline passengers in the US based on service satisfaction. Research in Transportation Business & Management, 37, 100550.
Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on knowledge and data engineering, 17(4), 491-502.
Malhotra, D., Malhotra, K., & Malhotra, R. (2020). Evaluating consumer loans using machine learning techniques. In Applications of Management Science (pp. 59-69). Emerald Publishing Limited.
Marcelino, P., de Lurdes Antunes, M., Fortunato, E., & Gomes, M. C. (2021). Machine learning approach for pavement performance prediction. International Journal of Pavement Engineering, 22(3), 341-354.
Mwadulo, M. W. (2016). A review on feature selection methods for classification tasks.
Nadali, A., Kakhky, E. N., & Nosratabadi, H. E. (2011). Evaluating the success level of data mining projects based on CRISP-DM methodology by a Fuzzy expert system. 2011 3rd International Conference on Electronics Computer Technology,
Rane, A., & Kumar, A. (2018). Sentiment classification system of Twitter data for US airline service analysis. 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC),
Roy, S. S., Kaul, D., Roy, R., Barna, C., Mehta, S., & Misra, A. (2016). Prediction of Customer Satisfaction Using Naive Bayes, MultiClass Classifier, K Star and IBK. International Workshop Soft Computing Applications,
Ruuska, S., Hämäläinen, W., Kajava, S., Mughal, M.,Matilainen, P., & Mononen, J. (2018). Evaluation of the confusion matrix method in the validation of an automated system for measuring feeding behaviour of cattle. Behavioural processes, 148, 56-62.
Saut, M., & Song, V. (2022). Influences of airport service quality, satisfaction, and image on behavioral intention towards destination visit. Urban, Planning and Transport Research, 10(1), 82-109.
Shafie, S., Soek, P. O., & Khai, W. K. (2023). Prediction of employee promotion using hybrid sampling method with machine learning architecture. Malaysian Journal of Computing (MJoC), 8(1), 1264-1286.
Tiwari, M. K., Deo, R. C., & Adamowski, J. F. (2021). Short term flood forecasting using artificial neural networks, extreme learning machines, and M5 model tree. In Advances in streamflow forecasting (pp. 263-279). Elsevier.
Verdu, S. (1998). Fifty years of Shannon theory. IEEE Transactions on information theory, 44(6), 2057-2078.
Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. Icml,
Yi, J., Zhang, H., Liu, H., Zhong, G., & Li, G. (2021). Flight delay classification prediction based on stacking algorithm. Journal of Advanced Transportation, 2021, 1-10.
Zhu, B., & Liu, Y. (2021). General Approximate Cross Validation for Model Selection: Supervised, Semi-supervised and Pairwise Learning. Proceedings of the 29th ACM International Conference on Multimedia,
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Farah Aqilah Bohani, Farah Syazwani Mohamed Rashid, Yuzi Mahmud, Sitti Rachmawati Yahya

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.




