Author ORCID Identifier

https://orcid.org/0000-0002-5957-1970

Date of Award

9-12-2024

Document Type

Thesis

School

Srinivasa Ramanujan Centre

Programme

Ph.D.-Doctoral of Philosophy

First Advisor

Dr.D.Narasimhan

Keywords

Air Quality Index Forecasting, Outdoor Air Pollution, Seasonal Imputation, Parallel Ensemble Model, Feature Selection

Abstract

Forecasting ambient air quality is essential for environmental sustainability and public health, especially in heavily populated regions such as China, India, and the United States where air pollution remains a serious concern. Traditional forecasting models often struggle to accurately represent air quality data because of its complex patterns and nonlinear interactions. To address these challenges and improve forecast performance, this research proposes a comprehensive strategy that integrates parallel heterogeneous ensemble modeling with Bayesian optimization.

The study begins with a seasonal machine learning–based imputation technique (SeasonalMLImpute) designed to handle missing data in meteorological and air quality parameters. This method is evaluated against conventional imputation approaches such as MissForest, k-nearest neighbours (KNN), and median imputation. The comparison highlights the ability of SeasonalMLImpute to better capture seasonal variations, thereby improving the overall quality and reliability of the dataset.

Next, a novel Weight Guided Feature Selection (WGFS) algorithm is introduced to identify the most influential meteorological and air quality variables for predicting the Air Quality Index (AQI). The performance of WGFS is assessed against existing feature selection techniques, including sequential forward selection and sequential backward elimination. The comparison demonstrates that WGFS enhances both model interpretability and prediction accuracy by selecting more relevant features.

After identifying the most significant features, the Parallel Heterogeneous Weighted Average Voting Ensemble (PH-WAVE) model is employed for AQI prediction. This ensemble model combines multiple base learners to capture the complex relationships between meteorological conditions and air quality measures. By integrating diverse prediction models—each focusing on different aspects of air quality variability—and using parallel processing, PH-WAVE offers improved scalability, computational efficiency, and forecasting precision.

Finally, Bayesian optimization is applied to fine-tune the hyperparameters of the heterogeneous base models, resulting in the optimized ensemble model (BOPH-WAVE). Through systematic exploration of the hyperparameter space, Bayesian optimization enhances the predictive performance of the ensemble. The optimized model is validated using real-world air quality datasets from China, India, and the United States. The results demonstrate substantial improvements in forecast accuracy, error reduction, and computation time compared with traditional forecasting techniques and existing ensemble methods. Moreover, the model exhibits strong robustness to environmental fluctuations, making it adaptable for diverse forecasting scenarios and geographic regions.

Share

COinS