Author ORCID Identifier
https://orcid.org/0000-0002-5957-1970
Date of Award
9-12-2024
Document Type
Thesis
School
Srinivasa Ramanujan Centre
Programme
Ph.D.-Doctoral of Philosophy
First Advisor
Dr.D.Narasimhan
Keywords
Air Quality Index Forecasting, Outdoor Air Pollution, Seasonal Imputation, Parallel Ensemble Model, Feature Selection
Abstract
Forecasting ambient air quality is essential for environmental sustainability and public health, especially in heavily populated regions such as China, India, and the United States where air pollution remains a serious concern. Traditional forecasting models often struggle to accurately represent air quality data because of its complex patterns and nonlinear interactions. To address these challenges and improve forecast performance, this research proposes a comprehensive strategy that integrates parallel heterogeneous ensemble modeling with Bayesian optimization.
The study begins with a seasonal machine learning–based imputation technique (SeasonalMLImpute) designed to handle missing data in meteorological and air quality parameters. This method is evaluated against conventional imputation approaches such as MissForest, k-nearest neighbours (KNN), and median imputation. The comparison highlights the ability of SeasonalMLImpute to better capture seasonal variations, thereby improving the overall quality and reliability of the dataset.
Next, a novel Weight Guided Feature Selection (WGFS) algorithm is introduced to identify the most influential meteorological and air quality variables for predicting the Air Quality Index (AQI). The performance of WGFS is assessed against existing feature selection techniques, including sequential forward selection and sequential backward elimination. The comparison demonstrates that WGFS enhances both model interpretability and prediction accuracy by selecting more relevant features.
After identifying the most significant features, the Parallel Heterogeneous Weighted Average Voting Ensemble (PH-WAVE) model is employed for AQI prediction. This ensemble model combines multiple base learners to capture the complex relationships between meteorological conditions and air quality measures. By integrating diverse prediction models—each focusing on different aspects of air quality variability—and using parallel processing, PH-WAVE offers improved scalability, computational efficiency, and forecasting precision.
Finally, Bayesian optimization is applied to fine-tune the hyperparameters of the heterogeneous base models, resulting in the optimized ensemble model (BOPH-WAVE). Through systematic exploration of the hyperparameter space, Bayesian optimization enhances the predictive performance of the ensemble. The optimized model is validated using real-world air quality datasets from China, India, and the United States. The results demonstrate substantial improvements in forecast accuracy, error reduction, and computation time compared with traditional forecasting techniques and existing ensemble methods. Moreover, the model exhibits strong robustness to environmental fluctuations, making it adaptable for diverse forecasting scenarios and geographic regions.
Recommended Citation
M, Vanitha Ms, "Enhancement of Ambient Air Quality Index Forecasting using Optimized Ensemble Model" (2024). Theses and Dissertations. 146.
https://knowledgeconnect.sastra.edu/theses/146