IMPACT OF FEATURE STANDARDIZATION ON HEART DISEASE PREDICTION: A COMPARATIVE ANALYSIS OF LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINE MODELS
DOI:
https://doi.org/10.24191/mjoc.v10i2.6835Keywords:
Cardiovascular Diseases, Feature Standardization, Heart Disease, Logistic Regression, Machine Learning Model, Support Vector MachineAbstract
Cardiovascular diseases are among the leading causes of global mortality. Heart disease, in particular, remains a major contributor to this burden, highlighting the need for effective predictive models to enable early detection. This study investigates the impact of feature standardization using StandardScaler on the performance of two prominent machine learning models involving Logistic Regression (LR) and Support Vector Machine (SVM) for predicting heart disease. The research utilizes a dataset comprising demographic and clinical attributes of patients, focusing on the role of feature standardization in enhancing model performance. The study compares models trained on raw data and standardized data, applying performance metrics such as accuracy, precision, recall, and F1-score. Results indicate that feature standardization significantly improves the performance of both models. LR showed a clear enhancement in macro F1-score on the testing set, rising from 0.82 without standardization to 0.87 with standardization. SVM was slightly superior in its raw form but still improved after standardization, with the macro F1-score increasing from 0.85 to 0.86. These findings highlight the importance of data pre-processing and demonstrate how feature scaling can optimize machine learning models for heart disease prediction. This research contributes to the growing field of predictive healthcare, offering valuable insights for clinicians seeking reliable early detection tools for cardiovascular conditions.
References
Al-Mejibli, I. S., Alwan, J. K., & Abd, D. H. (2020). The effect of gamma value on support vector machine performance with different kernels. International Journal of Electrical and Computer Engineering, 10(5), 5497–5506. https://doi.org/10.11591/ijece.v10i5.pp5 497-5506
Anitha, M., Savarimuthu, N., & Bhanu, S. M. S. (2025). Chi-Square Target Encoding for Categorical Data Representation: A Real-World Sensor Data Case Study. SN Computer Science, 6(3). https://doi.org/10.1007/s42979-025-03766-z
Balaraju, G., Reddy, M. D. S., Manjunath, S. R., Hemalatha, M., & Veena, N. (2024). Heart Disease Prediction Using Classification Techniques of Supervised Machine Learning. 2024 Second International Conference on Networks, Multimedia and Information Technology (NMITCON), 1–5. https://doi.org/10.1109/nmitcon62075.2024.10699057
Bhandari, A. (2025, April 23). What is Feature Scaling and Why is it Important? Analytics Vidhya. https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning -normalization-standardization/
Bohani, F. A., Rashid, F. S. M., Mahmud, Y., & Yahya, S. R. (2024). Analyzing The Impact of Feature Selection Using Information Gain for Airlines’ Customer Satisfaction. Malaysian Journal of Computing (MJOC), 9(1), 1673–1689. https://doi.org/10.24191 /mjoc.v9i1.24163
Cleveland Clinic. (2022, August 4). Hyperlipidemia. Cleveland Clinic. https://my.cleveland clinic.org/health/diseases/21656-hyperlipidemia
Fedesoriano. (2021, September 10). Heart Failure Prediction Dataset. Kaggle. https://www. kaggle.com /datasets/fedesoriano/heart-failure-prediction
Grgić, V., Mušić, D., & Babović, E. (2021). Model for predicting heart failure using Random Forest and Logistic Regression algorithms. IOP Conference Series Materials Science and Engineering, 1208(1), 012039. https://doi.org/10.1088/1757-899x/1208/1/012039
Guido, R., Ferrisi, S., Lofaro, D., & Conforti, D. (2024). An Overview on the Advancements of Support Vector Machine Models in Healthcare Applications: A Review. Information, 15(4), 235. https://doi.org/10.3390/info15040235
Hon, H., Wah Khaw, K., Chew, X., & Wong, W. (2023). Prediction of Customer Churn for ABC Multistate Bank Using Machine Learning Algorithms. Malaysian Journal of Computing, 8(2), 1602–1619. https://doi.org/10.24191/mjoc.v8i2.21393
Ibrahim, N., Ishak, U. M., Ali, N. N. A., & Shaadan, N. (2024). Machine Learning-Based Approaches for Credit Card Debt Prediction. Malaysian Journal of Computing (MJOC), 9(1), 1722–1733. https://doi.org/10.24191/mjoc .v9i1.25656
Islam, N. (2024). DTization: A New Method for Supervised Feature Scaling. ArXiv.org. https://arxiv.org/abs/2404.17937
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2014). An introduction to statistical learning: with Applications in R. Springer.
Krishna, T. B., Vimala, N., Vinay, P., Siddhardha, N., & Manohar, P. M. (2023). Early heart disease prediction using support Vector machine. In Lecture notes in networks and systems (pp. 471–479). https://doi.org/10.1007/978-981-99-3758-5_43
Kumar, A. (2023, April 15). SVM RBF Kernel Parameters: Python Examples - Analytics Yogi. Analytics Yogi. https://vitalflux.com/svm-rbf-kernel-parameters-code-sample/
Lach, J., Wiecha, S., Śliż, D., Price, S., Zaborski, M., Cieśliński, I., Postuła, M., Knechtle, B., & Mamcarz, A. (2021). HR Max Prediction Based on Age, Body Composition, Fitness Level, Testing Modality and Sex in Physically Active Population. Frontiers in Physiology, 12. https://doi.org/10.3389/fphys.2021. 695950
Leino, K., Black, E., Fredrikson, M., Sen, S., & Datta, A. (2018). Feature-Wise Bias Amplification. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1812.08999
Maxim, L. D., Niebo, R., & Utell, M. J. (2014). Screening tests: a review with examples. Inhalation Toxicology, 26(13), 811–828. https://doi.org/10.3109/08958378.2014.955932
Mojahid, H. Z., Zain, J. M., Yusoff, M., Basit, A., Jumaat, A. K., & Ali, M. (2025). Examining The Impact of Feature Selection Techniques on Machine and Deep Learning Models for The Prediction of Covid-19. Malaysian Journal of Computing, 10(1), 2135–2158. https://doi.org/10.24191/mjoc.v8i2.21393
Muhammad, Y., Tahir, M., Hayat, M., & Chong, K. T. (2020). Early and accurate detection and diagnosis of heart disease using intelligent computational model. Scientific Reports, 10(1). https://doi.org/10.1038/s41598-020-76635-9
Ozsahin, D. U., Taiwo Mustapha, M., Mubarak, A. S., Said Ameen, Z., & Uzun, B. (2022, August). Impact of feature scaling on machine learning models for the diagnosis of diabetes. 2022 International Conference on Artificial Intelligence in Everything (AIE). https://doi.org/10.1109/aie57029.2022.00024
Owusu, E., Boakye-Sekyerehene, P., Appati, J. K., & Ludu, J. Y. (2021). Computer‐Aided diagnostics of heart disease risk prediction using boosting Support Vector machine. Computational Intelligence and Neuroscience, 2021(1). https://doi.org/10.1155/2021 /3152618
Rainio, O., Teuho, J. and Klén, R. (2024). Evaluation metrics and statistical tests for machine learning. Scientific Reports, [online] 14(1), pp.1–14. doi:https://doi.org/10.1038/s41598-024-56706-x.
Rowden, A. (2024, April 25). What does ST depression on an ECG result mean? https://www.medicalnewstoday.com/articles/st-depression-on-ecg
Saito, T., & Rehmsmeier, M. (2015). The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLOS ONE, 10(3), e0118432. https://doi.org/10.1371/journal.pone.0118432
Sarra, R. R., Dinar, A. M., Mohammed, M. A., & Abdulkareem, K. H. (2022). Enhanced Heart Disease Prediction Based on Machine Learning and χ2 Statistical Optimal Feature Selection Model. Designs, 6(5), 87. https://doi.org/10.3390/designs6050087
Suhaimi, M. S. A., Ramli, N. A., & Muhammad, N. (2024). Heart disease prediction using ensemble of k-nearest neighbour, random forest and logistic regression method. AIP Conference Proceedings, 3080, 040009. https://doi.org/10.1063/5.0192203
Wanyonyi, E. N., & Masinde, N. W. (2025). The Impact of Data Preprocessing on Machine Learning Model Performance : A Comprehensive Examination. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 11(2), 3814–3827.
World Health Organization: WHO. (2021, June 11). Cardiovascular diseases (CVDs). https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
Wu, W., Wang, J., Lin, J. and Liu, X. (2025). Comparative Study of Adaptive l1-Regularization for the Application of Structural Damage Diagnosis Under Seismic Excitation. Buildings, [online] 15(10), pp.1628–1628. doi:https://doi.org/10.3390 /buildings15101628.
Zhang, Y., Diao, L., & Ma, L. (2021). Logistic regression models in predicting heart disease. Journal of Physics Conference Series, 1769(1), 012024. https://doi.org/10.1088/1742-6596/1769/1/012024
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Norsyela Muhammad Noor Mathivanan, Eric Foo Zhi Xian, Debbie Foo Yong Xi, Chua Hiang Kiat

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.




