PREDICTION OF EMPLOYEE PROMOTION USING HYBRID SAMPLING METHOD WITH MACHINE LEARNING ARCHITECTURE

Authors

  • Shahidan Shafie School of Management Universiti Sains Malaysia, 11800 Minden, Pulau Pinang.
  • Soek Peng Ooi School of Management Universiti Sains Malaysia, 11800 Minden, Pulau Pinang.
  • Khai Wah Khaw School of Management Universiti Sains Malaysia, 11800 Minden, Pulau Pinang.

DOI:

https://doi.org/10.24191/mjoc.v8i1.18456

Keywords:

Employee Promotion Prediction, Hybrid Sampling, Imbalanced Data Machine Learning

Abstract

Employee promotion plays an important role in an organization. It aids to inspire employees to grow and develop their skills, thus increase employee loyalty and reduce the turnover rate. This study predicts employee job promotion based on employee promotion data by using a hybrid sampling method with machine learning. The purpose of this study is to accelerate the promotion process and share the important features that might be determined when promoting an employee. In this study, there are eight machine learning algorithms have been used, such as Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbors, Support Vector Machine, Naïve Bayes, Adaptive Boosting Classifier, and Extreme Gradient Boost. The purpose of using eight machine learning algorithms is to find out the most suitable model to predict employee promotion. Additionally, hybrid sampling methods like Synthetic Minority Oversampling Technique combined with Edited Nearest Neighbor (SMOTE+ENN) and Synthetic Minority Oversampling Technique combined with Tomek Link (SMOTE+Tomek) were adopted. These two techniques are to cure the imbalanced dataset. For the importance of feature selection, the Recursive Feature Elimination method with Random Forest Classifier model (RFE-RFC), Explained Variance Ratio method with Principal Component Analysis (EVR-PCA), and the Rank Feature Importance method with Extra Classifier Tree model (RFI-ECT) is applied. The first 5, 8, and 12 features are selected based on the RFI-ECT to train the machine learning algorithms. As a result, the model is evaluated by precision, recall, and F1-score. In conclusion, the top five rank feature importance methods with the Extra Classifier Tree model are region, department, previous year rating, KPIs met and above 80%, and award won. The results suggest that SMOTE+ENN and Extreme Gradient Boost with eight features have the highest-performing model in this study

References

Abubaker, H., Ali, A., Shamsuddin, S. M., & Hassan, S. (2020). Exploring permissions in android applications using ensemble-based extra tree feature selection. Indonesian Journal of Electrical Engineering and Computer Science, 19(1), 543–552.

Aimran, N., Rambli, A., Afthanorhan, A., Mahmud, A., Sapri, A., & Aireen, A. (2022). Prediction of Malaysian women divorce using machine learning techniques. Malaysian Journal of Computing, 7(2), 1067-1081.

Aleem, M., & Bowra, Z. A. (2020). Role of training & development on employee retention and organizational commitment in the banking sector of Pakistan. Review of Economics and Development Studies, 6(3), 639–650.

Al Khaldy, M., & Kambhampati, C. (2018). Resampling imbalanced class and the effectiveness of feature selection methods for heart failure dataset. International Robotics & Automation Journal, 4, 37-45.

Anderson, C. (2019). Hot or not? Heatmaps and correlation matrices. A post at Medium available at https://medium.com/@connor.anderson_42477/hot-or-not-heatmaps-andcorrelation-matrix-plots-940088fa2806

Ayoubi, S., Limam, N., Salahuddin, M. A., Shahriar, N., Boutaba, R., Estrada-solano, F., & Caicedo, O. M. (2018). Machine learning for cognitive network management. IEEE Communications Magazine, 158–165.

Bandi, G. N. S., Rao, T. S., & Ali, S. S. (2021). Data Analytics Applications for Human Resource Management. 2021 International Conference on Computer Communication and Informatics, 2021, 31–34.

Bolton, R., Dongrie, V., Saran, C., Ferrier, S., Mukherjee, R., Soderstrom, J., Brisson, S., & Adams, N. (2019). The future of HR 2019: In the know or in the no. A post at KPMG available at https://assets.kpmg/content/dam/kpmg/pl/pdf/2019/05/pl-Raport-KPMGThe-future-of-HR-2019-In-the-Know-or-in-the-No.pdf

Castellanous, S. (2019). HR departments turn to AI enabled recruiting in race for talent. A post at The Wall Street Journal available at https://www.wsj.com/articles/hrdepartments-turn-to-ai-enabled-recruiting-in-race-for-talent-11552600459

Charbuty, B., & Abdulazeez, A. (2021). Classification based on Decision Tree algorithm for Mmachine learning. Journal of Applied Science and Technology Trends, 2(01), 20–28.

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. International Conference on Knowledge Discovery and Data Mining, 785–794.

Chen, Z., & Fan, W. (2021). A freeway travel time prediction method based on an xgboost model. Sustainability (Switzerland), 13(15).

Cheruku, S. K. (2019). What is multicollinearity and how affects model performance in machine learning? A post a LinkedIn available at https://www.linkedin.com/pulse/whatmulticollinearity-how-affects-model-performance-machine-cheruku/.

Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(6), 1–13.

Cortes, C., & Vapnik, V. (1995). Support-vector networks. IEEE Expert-Intelligent Systems and Their Applications, 20, 273–297.

Cover, T. & Hart, P. (1967). Nearest neighbour pattern classification, IEEE Transactions on Information Theory, 13(1), 21-27.

Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society, 20(2), 215–242.

Cruz, J. A., & Wishart, D. S. (2006). Applications of machine learning in cancer prediction and prognosis. Cancer Inform, 2, 59-77.

Das, T. K. (2015). A customer classification prediction model based on machine learning techniques. IEEE, 321–326.

Devi, D., Biswas, S. K., & Purkayastha, B. (2020). A review on solution to class imbalance problem: undersampling approaches. International Conference on Computational Performance Evaluation, 626-631.

Freund, Y. & Schapire, R. E. (1999). A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5), 771-780.

Garg, S., Sinha, S., Kar, A. K., & Mani, M. (2021). A review of machine learning applications in human resource management. International Journal of Productivity and Performance Management.

Gazzah, S., Hechkel, A., & Amara, N. E. B. (2015). A hybrid sampling method for imbalanced data. International Multi-Conference on Systems, Signals & Devices, 1-6.

Guan, H., Zhang, Y., Xian, M., Cheng, H. D., & Tang, X. (2021). SMOTE-WENN: Solving class imbalance and small sample problems by oversampling and distance scaling. Applied Intelligence, 51(3), 1394–1409.

Hetland, J., Hetland, H., Bakker, A. B., & Demerouti, E. (2018). Daily transformational leadership and employee job crafting: The role of promotion focus. European Management Journal, 36(6), 746–756.

Ibrahim, A., Muhammed, M. M., Sowole, S. O., Raheem, R., & Rabiat, O. (2020). Performance of catboost classifier and other machine learning methods. 1–14.

Jaffar, Z., Noor, W., & Kanwal, Z. (2019). Predictive human resource analytics using data mining classification techniques. International Journal of Computer, 32(1), 9–20.

Jain, N., & Bhushan, M. (2020). Transforming human resource perspective through HR analytics. In Management Dynamics in Digitalization Era.

Jeon, Y. S., & Lim, D. J. (2020). PSU: Particle Stacking Undersampling Method for Highly Imbalanced Big Data. IEEE Access, 8, 131920–131927.

Jiang, K., Wang, W., Wang, A., & Wu, H. (2020). Network intrusion detection combined hybrid sampling with deep hierarchical network. IEEE Access, 8(3), 32464–32476.

Jomthanachai, S., Wong, W. P. & Khaw, K. W. (2022). An application of machine learningregression to festure selection: A study of logistics performance and economic attribute, Neural Computing and Applications, 34, 15781-15805.

Jyoti, P. B. (2022). Employee promotion: The types, benefits, & whom to promote. A post at Vantage Circle available at https://blog.vantagecircle.com/employee-promotion/

Kakulapati, V., Chaitanya, K. K., Chaitanya, K. V. G., & Akshay, P. (2020). Predictive analytics of HR - A machine learning approach. Journal of Statistics and Management Systems, 23(6), 959–969.

Keawwiset, T., Temdee, P., & Yooyativong, T. (2021). Employee classification forpersonalized professional training using machine learning techniques and SMOTE. The 6th International Conference on Digital Arts, Media and Technology, 376–379.

Lanier, S. T. (2020). Choosing performance metrics. A post at Towards Data Science available at https://towardsdatascience.com/choosing-performance-metrics61b40819eae1

Lin, M., Zhu, X., Hua, T., Tang, X., Tu, G., & Chen, X. (2021). Detection of ionospheric scintillation based on xgboost model improved by smote enn technique. Remote Sensing, 13(13), 1–22.

Liu, J., Wang, T., Li, J., Huang, J., Yao, F., & He, R. (2019). A data-driven analysis of employee promotion: The role of the position of organization. IEEE International Conference on Systems, Man and Cybernetics (SMC), 4056–4062.

Long, Y., Liu, J., Fang, M., Wang, T., & Jiang, W. (2018). Prediction of employee promotion based on personal basic features and post features. ACM International Conference Proceeding Series on (pg. 5–10).

Lu, Y., Cheung, Y., & Tang, Y. Y. (2016). Hybrid sampling with bagging for class imbalance learning. Springer International Publishing Switzerland 2016, 14–26.

Malik, E. F., Khaw, K. W., Belaton, B., Wong, W. P., & Chew, X. Y. (2022). Credit card fraud detection using a new hybrid machine learning architecture. Mathematics, 10,1480.

Morde, V. (2019). XGBoost algorithm: Long may she reign! A post at Towards Data Science available at https://towardsdatascience.com/https-medium-com-vishalmorde-xgboostalgorithm-long-she-may-rein-edd9f99be63d.

Morgan, J. N., & Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal. Journal of the American Statistical Association, 58(302), 415–434.

Mulak, P., & Talhar, N. (2013). Analysis of Distance Measures Using K-Nearest Neighbor Algorithm on KDD Dataset. International Journal of Science and Research, 4, 2319–7064.

Nandipati, S., Chew, X. Y., & Khaw, K. W. (2020). Hepatitis C virus (HCV) prediction by machine learning techniques. Applications of Modelling and Simulation, 4, 89-100.

Oladunni, T., & Sharma, S. (2016). Hedonic housing theory – A machine learning investigation. 15th IEEE International Conference on Machine Learning and Applications (ICMLA), 522–527.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit learn: Machine learning in python. Journal of Machine Learning Research, 12, 2825-2830.

Peng, Z., Yan, F., & Li, X. (2019). Comparison of the different sampling techniques for imbalanced classification problems in machine learning. 11th International Conference on Measuring Technology and Mechatronics Automation, ICMTMA, 431–434.

Pisal, N. S., Abdul-Rahman, S., Hanafiah, M., & Kamarudin, S. I. (2022). Prediction of life expectancy for Asian population using machine learning algorithms. Malaysian Journal of Computing, 7(2), 1150-1161.

Punnoose, R., & Ajit, P. (2016). Prediction of employee turnover in organizations using machine learning algorithms. International Journal of Advanced Research in Artificial Intelligence, 5(9), 22–26.

Sarker, A., Shamim, S. M., Shahiduz, M., Rahman, Z. M., Shahiduz Zama, M., & Rahman, M. (2018). Employee's performance analysis and prediction using k-means clustering & Decision Tree algorithm. International Research Journal Software & Data Engineering Global Journal of Computer Science and Technology, 18(1), 7.

Sawangarreerak, S., & Thanathamathee, P. (2020). Random Forest with sampling techniques for handling imbalanced prediction of university student depression. Information, 11(11), 1–13.

Saxena, M., Bagga, T., & Gupta, S. (2021). Fearless path for human resource personnel's through analytics: a study of recent tools and techniques of human resource analytics and its implication. International Journal of Information Technology (Singapore), 13(4), 1649–1657.

Sharaff, A., & Gupta, H. (2019). Extra-Tree Classifier with Metaheuristics Approach for Email Classification. Advances in Computer Communication and Computational Sciences, 189–197.

Sisodia, D. S., Vishwakarma, S., & Pujahari, A. (2017). Evaluation of machine learning models for employee churn prediction. Proceedings of the International Conference on Inventive Computing and Informatics, 2017 ICICI Conference on (pg. 1016–1020).

Tatbul, N., Lee, T. J., Zdonik, S., Alam, M., & Gottschlich, J. (2018). Precision and recall for time series. 32nd Conference on Neural Information Processing Systems, 1–11.

Tsai, J. K., & Hung, C. H. (2021). Improving adaboost classifier to predict enterprise performance after covid-19. Mathematics, 9(18), 1–10.

Varmedja, D., Karanovic, M., Sladojevic, S., Arsenovic, M., & Anderla, A. (2019). Credit card fraud detection - Machine learning methods. 18th International Symposium INFOTEH-JAHORINA, 1–5.

Vetter, T. R. (2017). Descriptive statistics : Reporting the answers to the 5 basic questions of who, what, why, when, where, and a sixth, so what? Anesthesia & Analgesia, 125(5), 1797–1802.

Weiss, G., McCarthy, K., & Zabar, B. (2007). Cost-sensitive learning vs. sampling: Which is best for handling unbalanced classes with unequal error costs? 1–7.

Xu, Z., Shen, D., Nie, T., & Kou, Y. (2020). A hybrid sampling algorithm combining MSMOTE and ENN based on Random Forest for medical imbalanced data. Journal of Biomedical Informatics, 107, 1–11.

Yuan, B. W., Luo, X. G., Zhang, Z. L., Yu, Y., Huo, H. W.,Johannes, T., & Zou, X. D. (2021). A novel density-based adaptive k nearest neighbor method for dealing with overlapping problem in imbalanced datasets. Neural Computing and Applications, 33(9), 4457–4481.

Zahin, S. A., Ahmed, C. F., & Alam, T. (2018). An effective method for classification with missing values. Applied Intelligence, 48(10), 3209–3230.

Zhang, Y., & Lu, S. (2021). Multi-model fusion method and its application in prediction of stock index movements. ACM International Conference Proceeding Series on (pg. 58–64).

Zhao, Y., Hryniewicki, M. K., Cheng, F., Fu, B., & Zhu, X (2018). Employee turnover prediction with machine learning: A reliable approach. Advances in Intelligent Systems and Computing, 869, 737–758.

Zhou, H. F., Zhang, J. W., Zhou, Y. Q., Guo, X. J., & Ma, Y. M. (2021). A feature selection algorithm of Decision Tree based on feature weight. Expert Systems with Applications, 164(July 2020), 113842.

Zhu, H. (2021). Research on human resource recommendation algorithm based on machine learning. Scientific Programming.

Zulfikar, W. B., Jumadi, Prasetyo, P. K., & Ramdhani, M. A. (2018). Implementation of mamdani fuzzy method in employee promotion system. IOP Conference Series: Materials Science and Engineering, 288(1), 1–5.

Downloads

Published

2023-04-10

How to Cite

Shafie, S., Ooi, S. P., & Khaw, K. W. . (2023). PREDICTION OF EMPLOYEE PROMOTION USING HYBRID SAMPLING METHOD WITH MACHINE LEARNING ARCHITECTURE . Malaysian Journal of Computing, 8(1), 1264–1286. https://doi.org/10.24191/mjoc.v8i1.18456