Air Pollution and Cardiovascular Mortality: An ML-Based Early Prediction Framework using Random Forest
Main Article Content
Abstract
Responsible for an estimated 8.3 million premature deaths annually, of which more than half involve cardiovascular pathways. Fine particulate matter (PM2.5) — the most cardiotoxic ambient pollutant — penetrates alveolar tissue, enters systemic circulation, triggers oxidative stress and pro-inflammatory cytokine cascades (IL-6, TNF-α, CRP), destabilises atherosclerotic plaques, and precipitates acute myocardial infarction and cerebrovascular accidents. Despite overwhelming epidemiological evidence, clinical decision-support tools still rarely incorporate real-time environmental exposure data, creating a systematic blind spot in risk stratification. This paper presents a comprehensive dual-stream Machine Learning (ML) framework that fuses real-time ambient air quality telemetry — sourced from CPCB monitoring stations and WHO guidelines — with patient-level clinical parameters drawn from Electronic Health Records (EHR), to predict cardiovascular mortality risk at the individual level. A Random Forest ensemble classifier was selected as the primary model following rigorous benchmarking against four baseline algorithms (Logistic Regression, SVM, Decision Tree, and Neural Network). Experimental evaluation on a composite dataset of 2,500 subjects achieved an accuracy of 88.3%, precision of 87.1%, recall of 86.4%, and F1-score of 86.7%, outperforming all baselines. SMOTE oversampling corrected high-pollution exposure (≥20 years) was associated with a 68% cumulative increase in CVD mortality risk above baseline. class imbalance, and 5-fold stratified cross-validation ensured robust generalisation. PM2.5 concentration (Gini importance 0.31), systolic blood pressure (0.22), and patient age (0.18) were identified as the three most critical predictive dimensions
Article Details
Section
How to Cite
References
1. Zhang Y, et al. Short-Term Relationship Between Air Pollution and Mortality from Respiratory and Cardiovascular Diseases in China, 2008–2020. Toxics, 13(3):156, 2025.
2. Karuppasamy, M., & Poorani, K. (2023). Information system for neuropathy prediction ensembling ranking and ordered clustering for diabetic healthcare monitoring. ICT infrastructure and computing. ICT4SD.
3. Chawla NV, et al. SMOTE: Synthetic Minority Over-sampling Technique. Journal of AI Research, 16:321–357, 2002.
4. Karuppasamy, M., Jansi Rani, M., Usha, S., Susila, N., & Poorani, K. (2025). Microarray Data Feature Selection and Classification Using Graph Neural Networks. In Graph Neural Networks: Essentials and Use Cases (pp. 243-251). Cham: Springer Nature Switzerland.
5. Sagheer U, Al-Kindi S, Abohashem S, et al. Environmental Pollution and Cardiovascular Disease: Part 1 of 2: Air Pollution. JACC: Advances, 3(2):100805, 2024.
6. M. Jansi Rani, M. K, M. Prabha, and K. Pooran, Detection of COVID-19 CoronaVirus Using ResNet Deep Learning Technique, Signal Processing, Telecommunication and Embedded Systems with AI and ML Applications, Lecture Notes in Electrical Engineering 1281 (2025), 71-83.
7. Central Pollution Control Board (CPCB). National Ambient Air Quality Monitoring Programme (NAMP) — Annual Report 2023. Ministry of Environment, Forest and Climate Change, Government of India, 2023.
8. Prabha, M., Saraswathi, P., Karuppasamy, M., JansiRani, M., Dharshana, V., & Gomathi Keerthana, R. S. (2023, November). Student Chabot for University Admission Using Artificial Intelligence. In 2023 3rd International Conference on Advancement in Electronics & Communication Engineering (AECE) (pp. 512-515). IEEE.
9. WHO Global Air Quality Guidelines. Particulate Matter (PM2.5 and PM10), Ozone, NO₂, SO₂ and CO. WHO Press, 2021.
10. Karuppasamy, M., & Balakannan, S. P. (2019). RETRACTED ARTICLE: An improving data delivery method using EEDD algorithm for energy conservation in green cloud network: M. Karuppasamy, SP Balakannan. Soft Computing, 23(18), 8643-8649.
11. Khaltaev N. Cardiovascular disease mortality and air pollution in countries with different socioeconomic status. Chronic Diseases and Translational Medicine, 10:247–255, 2024.
12. Jansi Rani, M., Karuppasamy, M., & Poorani, K. (2023, December). Microarray data classification and gene selection using convolutional neural network. In International conference on information and communication technology for competitive strategies (pp. 225-234). Singapore: Springer Nature Singapore.
13. Brook RD, Rajagopalan S, Pope CA III, et al. Particulate matter air pollution and cardiovascular disease. Circulation, 121(21):2331–2378, 2010.
14. Karuppasamy, M., Rani, M. J., Kotha, M., Suma, S., Subburaj, T., Begum, H., & Suthendran, K. (2023, October). Retracted: An Advanced Analysis of Disease Prediction and Prevention Using Machine Learning. In 2023 IEEE 5th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA) (pp. 37-40). IEEE.
15. Institute for Health Metrics and Evaluation (IHME). Global Burden of Disease Study — India-Specific Data. University of Washington, 2024.
16. M. K, M. Prabha, and M. Jansi Rani, Future Worth: Predicting Resale Values with Machine Learning Techniques, Inventive Communication and Computational Technologies, Lecture Notes in Networks and Systems 757 (2023), 1101-1112.
17. Münzel T, Sørensen M, Lelieveld J, et al. A comprehensive expert statement on environmental risk factors of cardiovascular disease. Cardiovascular Research, 121(11):1653–1678, 2025.
18. Kalpanadevi, D., Rani, M. J., & Karuppasamy, M. (2022). Enhancement of RK-blowfish algorithm for data encryption through block chain in healthcare system. Mathematical Statistician and Engineering Applications, 71(3s2), 70-80.
19. Newby DE, Mannucci PM, Tell GS, et al. Expert position paper on air pollution and cardiovascular disease. European Heart Journal, 36(2):83–93, 2015.
20. Karuppasamy, M., Jansi Rani, M., & Poorani, K. (2025, January). Exploring Advancements in Diabetes Prediction with Machine Learning—An Approach Toward Explainable AI (XAI). In International Conference on Smart Trends for Information Technology and Computer Communications (pp. 339-348). Singapore: Springer Nature Singapore.
21. Ministry of Health and Family Welfare, GoI. National Programme for Prevention and Control of NCDs (NP-NCD) — Programme Guidelines, 2024.
22. Sharma, S., Dubey, R. K., Sharma, A. K., Gupta, S. K., Dandotiya, N., Nancy Deborah, R., ... & Sri Harshivan, E. S. (2024). 2024 4th International Conference on Advancement in Electronics & Communication Engineering (AECE) November 22-23, 2024 In Collaboration with ECE & CSE Department Raj Kumar Goel Institute of Technology, Ghaziabad, UP India.
23. Guyatt AL, Cai YS, Tobin MD, Hansell AL. Air pollution, lung function and mortality: survival and mediation analyses in UK Biobank. ERJ Open Research, 10(2):00093-2024, 2024.
24. Karuppasamy, M., Jansi Rani, M., & Prabha, M. (2022). An efficient resource allocation mechanism using intelligent scheduling for managing energy in cloud computing infrastructure. In Information and Communication Technology for Competitive Strategies (ICTCS 2021) Intelligent Strategies for ICT (pp. 81-86). Singapore: Springer Nature Singapore.
25. Breiman L. Random Forests. Machine Learning, 45(1):5–32, 2001.
26. Poorani, K., & Karuppasamy, M. (2022, December). Analysis of underlying and forecasting factors of type 1 diabetes and prediction of diabetes using machine learning. In International Conference on Information and Management Engineering (pp. 93-100). Singapore: Springer Nature Singapore.
27. World Health Organization. Air Quality and Health Fact Sheet. WHO Press, Geneva, 2024.
28. Saravanan, A., Kirthika, A., Saranya, A., Kavithamani, A., Sathiya, A., Sridevi, A., ... & Mathur, A. (2024). 2024 4th International Conference on Advancement in Electronics & Communication Engineering (AECE) November 22-23, 2024 In Collaboration with ECE & CSE Department Raj Kumar Goel Institute of Technology, Ghaziabad, UP India.
29. Rahmanian V, et al. Burden of cardiovascular disease attributed to air pollution: a systematic review. Globalization and Health, 20:40, 2024.
30. Rani, M. J., & Karuppasamy, M. (2022). CLOUD COMPUTING-BASED PARALLEL MUTUAL INFORMATION FOR GENE SELECTION AND SUPPORT VECTOR MACHINE CLASSIFICATION FOR BRAIN TUMOR MICROARRAY DATA. NeuroQuantology, 20(6), 6223-6234.
31. Jeyavani, M., & Karuppasamy, M. (2023, July). Brain Tumor Early Diagnosis Using Hybrid Fuzzy K-Means. In Proceedings of International Conference on Computational Intelligence: ICCI 2022 (p. 113). Springer Nature.
32. Prabhakaran D, et al. Systematic Review on Cardiovascular Disease Prevalence in India. Indian Heart Journal, 2024.
33. Jansi Rani M, Poorani K, Metaheuristic Feature Selection for Diabetes Prediction with P-G-S Approach 4th International Conference on Evolutionary Computing and Mobile Sustainable Networks, Procedia Computer Science 252 (2025) 165–171.
34. World Health Organization. Air Quality and Health Fact Sheet. WHO Press, Geneva, 2024.
35. Karuppasamy, M., Prabha, M., & Jansi Rani, M. (2023, May). Future Worth: Predicting Resale Values with Machine Learning Techniques. In International Conference on Information, Communication and Computing Technology (pp. 1101-1112). Singapore: Springer Nature Singapore.