Speaker
Description
Missing values are a common phenomenon in real-world time series datasets and can significantly impact the precision and reliability of data analysis and machine learning models. This research project aims to discuss the types of missing data occurrence and test and analyze different possibilities of their imputation. The methods taken into consideration will start from the simplest ones based on statistics, go through regression models, neural networks, and finally LLMs.
The effectiveness of these imputation techniques will be measured and tested on atmospheric pollution data, primarily focusing on PM10, PM2.5, SO2, and NO2 levels. The performance of each method will be evaluated based on accuracy, consistency, and the impact on subsequent predictive models.