Modelling Hypertension among Adults in South Africa through SMOTE-Based Balanced Data with Machine Learning Approaches

Muhammad Hoque; Rafiul Hoque

Home
Volume 05 (2026), Version 03
Modelling Hypertension among Adults in South Africa through SMOTE-Base…

Original Article Open Access

Modelling Hypertension among Adults in South Africa through SMOTE-Based Balanced Data with Machine Learning Approaches

,

Annals of Medicine and Medical Sciences Volume 05 (2026), Version 03 March 2, 2026 pp. 273 - 281

Abstract

Objectives: The objective of the study was to use machine learning (ML) methodology to create prediction models for hypertension based on nationally representative health and demographic surveillance of South African adults, and to consider class imbalance and improved interpretability. Materials and Methods: This was a cross-sectional analytical study utilizing secondary data from Wave 5 of the National Income Dynamics Study (NIDS) covering a total of 21,181 adult respondents across all nine provinces of South Africa. Various ML algorithms were trained and tested. Accuracy, precision, recall, F1-score, and receiver operating characteristic curve's area under it (ROC AUC) were used for model performance evaluation. Feature importance was also investigated by applying SHapley Additive exPlanations (SHAP) for improved interpretability. Results: In the absence of SMOTE, ensemble models attained moderate accuracy (73–75%) but poor sensitivity for classifying hypertensive cases. With SMOTE, their performance greatly improved, and ensemble models of Gradient Boosting, LightGBM, and CatBoost attained perfect classification (accuracy, precision, recall, and ROC AUC = 1.00). Random Forest provided optimum trade-off between accuracy, stability, and interpretability. SHAP analysis identified age, body mass index (BMI), and waist circumference as having greatest influence, followed by sex and lifestyle factors such as smoking and exercise. Conclusion: Incorporation of SMOTE into ensemble ML algorithms significantly improves hypertension prediction model accuracy and sensitivity for South Africa. The study highlights the value of interpretable, data-intensive methodologies for facilitating early identification, focus-dose interventions, and data-informed public health decisions for resource-constrained environments.

Keywords: Hypertension Machine Learning Predictive Modelling Risk Factors Algorithms Epidemiology

References

World Health Organization. Hypertension. Geneva: WHO; 2023 [cited 2025 Oct 7]. Available from: https://www.who.int/news-room/fact-sheets/detail/hypertension
Ataklte F, Erqou S, Kaptoge S, Taye B, Echouffo-Tcheugui JB, Kengne AP. Burden of undiagnosed hypertension in sub-Saharan Africa: a systematic review and meta-analysis. Hypertension. 2015;65(2):291–8.
Mills KT, Bundy JD, Kelly TN, Reed JE, Kearney PM, Reynolds K, Chen J, He J. Global disparities of hypertension prevalence and control: a systematic analysis of population-based studies from 90 countries. Circulation. 2016;134(6):441–50.
Ware LJ, Chidumwa G, Charlton K, Schutte AE, Kowal P. Predictors of hypertension awareness, treatment and control in South Africa: results from the WHO-SAGE population survey (Wave 2). J Hum Hypertens. 2019;33(2):157–66.
Kuhudzai AG, Van Hal G, Van Dongen S, Hoque ME. Modelling of South African hypertension: application of panel quantile regression. Int J Environ Res Public Health. 2022;19(10):5802.
Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, et al. A guide to deep learning in healthcare. Nat Med. 2019;25(1):24–9.
Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019;380(14):1347–58.
Araujo-Moura K, Souza L, de Oliveira TA, Rocha MS, De Moraes AC, Chiavegatto Filho A. Prediction of hypertension in the pediatric population using machine learning and transfer learning: a multicentric analysis of the SAYCARE study. Int J Public Health. 2025;70:1607944.
Silva GF, Fagundes TP, Teixeira BC, Chiavegatto Filho AD. Machine learning for hypertension prediction: a systematic review. Curr Hypertens Rep. 2022;24(11):523–33.
Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. 2019;1(6):e271–e97. doi:10.1016/S2589-7500(19)30123-2.
Rundo F, Militello C, Vitabile S, Mauri G. Advanced machine learning techniques in hypertension risk prediction: a systematic review. Comput Biol Med. 2022;148:105800. doi:10.1016/j.compbiomed.2022.105800.
Tsuro U, Ncube T, Oladimeji KE, Apalata TR. Predicting hypertension among HIV patients on antiretroviral therapy in rural Eastern Cape, South Africa using machine learning. medRxiv. 2025 Jan 12.
Chowdhury MH, Islam Shuzan MN, Chowdhury ME, Mahbub ZB, Uddin MM, Khandakar A, Ibne Reaz MB. Estimating blood pressure from photoplethysmogram signal and demographic features using machine learning techniques. arXiv [Preprint]. 2020 May. arXiv:2005.
Little RJ, Rubin DB. Statistical Analysis with Missing Data. 3rd ed. Hoboken: John Wiley & Sons; 2019.
Batista GE, Monard MC. An analysis of four missing data treatment methods for supervised learning. Appl Artif Intell. 2003;17(5–6):519–33.
Hansen J, Pourghaderi AR, Ahern S, Earnest A. Evaluating methods of outlier detection when benchmarking clinical registry data–a simulation study. Health Serv Outcomes Res Methodol. 2025;25(3):246–64.
Abbas S, Sampedro G, Krichen M, Alamro M, Mihoub A, Kulhanek R. Effective hypertension detection using predictive feature engineering and deep learning. IEEE Access. 2024;1–1. doi:10.1109/ACCESS.2024.3418553.
Chicco D, Oneto L, Tavazzi E. Eleven quick tips for data cleaning and feature engineering. PLoS Comput Biol. 2022;18(12):e1010718.
Islam SM, Talukder A, Awal MA, Siddiqui MM, Ahamad MM, Ahammed B, et al. Machine learning approaches for predicting hypertension and its associated factors using population-level data from three South Asian countries. Front Cardiovasc Med. 2022;9:839379.
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321–57.
Blagus R, Lusa L. SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics. 2013;14:106.
Islam MM, Alam MJ, Maniruzzaman M, Ahmed NF, Ali MS, Rahman MJ, Roy DC. Predicting the risk of hypertension using machine learning algorithms: a cross-sectional study in Ethiopia. PLoS One. 2023;18(8):e0289613.
Mroz T, Griffin M, Cartabuke R, Laffin L, Russo-Alvarez G, Thomas G, et al. Predicting hypertension control using machine learning. PLoS One. 2024;19(3):e0299932.
Niakan Kalhori SR, Tanhapour M, Gholamzadeh M. Enhanced childhood diseases treatment using computational models: systematic review of intelligent experiments heading to precision medicine. 2024.
Zhong X, Yu J, Jiang F, Chen H, Wang Z, Teng J, et al. A risk prediction model based on machine learning for early cognitive impairment in hypertension: development and validation study. Front Public Health. 2023;11:1143019.
López-Martínez F, Núñez-Valdez ER, Crespo RG, García-Díaz V. An artificial neural network approach for predicting hypertension using NHANES data. Sci Rep. 2020;10:10620. doi:10.1038/s41598-020-67640-z.
Mahardika TNQ, Fuadah YN, Jeong DU, Lim KM. PPG signals-based blood pressure estimation using grid search in hyperparameter optimization of CNN–LSTM. Diagnostics. 2023;13(15):2566.
Sokolova M, Japkowicz N, Szpakowicz S. Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In: Australasian Joint Conference on Artificial Intelligence; 2006 Dec; Berlin, Heidelberg. p. 1015–21.
Powers DM. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv [Preprint]. 2020 Oct. arXiv:2010.16061.
Ponce-Bobadilla AV, Schmitt V, Maier C, Mensing S. Practical guide to SHAP analysis: explaining supervised machine learning model predictions in drug development. Clin Transl Sci. 2024;17(11):e70056.
Pedregosa F, Varoquaux G, Gramfort V, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011;12:2825–30.
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
Dietterich TG. Ensemble methods in machine learning. In: International Workshop on Multiple Classifier Systems; 2000 Jun; Berlin, Heidelberg. p. 1–15.
Ganaie MA, Hu M, Malik AK, Tanveer M, Suganthan PN. Ensemble deep learning: a review. Eng Appl Artif Intell. 2022;115:105151.
Sagi O, Rokach L. Ensemble learning: a survey. Wiley Interdiscip Rev Data Min Knowl Discov. 2018;8(4):e1249.
Du J, Chang X, Ye C, Zeng Y, Yang S, Wu S, Li L. Developing a hypertension visualization risk prediction system utilizing machine learning and health check-up data. Sci Rep. 2023;13:18953.
Bisong E, Jibril N, Premnath P, Buligwa E, Oboh G, Chukwuma A. Predicting high blood pressure using machine learning models in low- and middle-income countries. BMC Med Inform Decis Mak. 2024;24:234.
Islam H, Iqbal MS, Hossain MM. Blood pressure abnormality detection and interpretation utilizing explainable artificial intelligence. Intell Med. 2025;5(1):54–65.

Modelling Hypertension among Adults in South Africa through SMOTE-Based Balanced Data with Machine Learning Approaches

Author Resources