Date of Award

29-1-2025

Document Type

Thesis

School

School of Computing

Programme

Ph.D.-Doctoral of Philosophy

First Advisor

Dr.K.Kannan

Keywords

Softsets, Parameter Reductions, Machine Learning, Neural Networks

Abstract

Cardiovascular diseases (CVDs) are the leading cause of mortality worldwide, and India reports a significantly high death rate due to its large population base and the increasing prevalence of non-communicable diseases. National statistics indicate that 20–27% of deaths in India are attributed to CVDs, with the proportion steadily rising over the years. Recognizing the urgency of early detection and risk prevention, the World Health Organization (WHO) introduced “The Global Action Plan for the Prevention and Control of Non-Communicable Diseases (2013–2020),” emphasizing early identification, risk reduction, and timely treatment. In this context, decision-making applications have gained importance across domains especially healthcare where effective and timely decisions can prevent premature deaths.

This research focuses on developing a decision-making algorithm for identifying significant risk factors associated with CVDs using a hybrid approach combining soft sets and machine learning. A soft-set-based parameter reduction algorithm is proposed to identify essential parameters influencing cardiovascular risk. The algorithm represents patient data as soft sets, constructs a map matrix, and performs parameter reduction with a computational complexity of O(nf + 2^f). Real-world data consisting of nine clinical features collected from diagnostic laboratories in Kumbakonam, Tamil Nadu, are processed using this approach, with triglycerides emerging as a key factor across all reductions.

Subsequently, various machine learning classifiers including SVM, KNN, LDA, Decision Tree, Random Forest, Naïve Bayes, CART, and Logistic Regression are applied to develop predictive models. Among these, Random Forest achieves the highest accuracy of 69.23%. Clustering techniques such as k-means, PAM, hierarchical clustering, and fuzzy clustering are also used to analyze patient risk groups, supported by validation methods including Hopkins statistic, Dunn’s index, silhouette analysis, PCA, and model-based clustering.

A hybrid integration of soft-set-based reductions with machine learning further improves prediction accuracy, with Random Forest achieving 88.46%. Ensemble learning methods bagging, boosting, and stacking identify an efficient parameter subset consisting of Gender, SugarPP, Creatinine, Total Cholesterol, HDL, and LDL, yielding 93% accuracy. Across all evaluations, Total Cholesterol and LDL consistently emerge as predominant risk factors. The proposed framework demonstrates strong potential for enhancing early diagnosis and supporting clinical decision-making in cardiovascular healthcare.

Share

COinS