Aug 17 – 21, 2026
National Institute for Space Research, São José dos Campos, SP, Brazil
America/Sao_Paulo timezone

Machine Learning-Based Forecasting of Geomagnetic Activity: A Benchmark of Models Applied to the Kp and Dst Indices

Not scheduled
20m
Fernando de Mendonça - LIT (National Institute for Space Research, São José dos Campos, SP, Brazil)

Fernando de Mendonça - LIT

National Institute for Space Research, São José dos Campos, SP, Brazil

Av. dos Astronautas, 1758 - Jardim da Granja, São José dos Campos - SP, 12227-010
Oral Machine Learning in Space, Earth & Atmospheric Sciences Oral Contributions

Speaker

Marjori Klinczak (Unifatec)

Description

This study presents a benchmark of machine learning models for forecasting geomagnetic indices, specifically Kp and Dst, using NASA’s OMNI dataset with hourly resolution over the period from 2015 to 2024. The input variables include solar wind parameters and interplanetary magnetic field components, such as total magnetic field intensity, Bx, By, and Bz components, solar wind speed and density, and plasma temperature. After preprocessing, which involved removing missing values and inconsistent records, the final dataset was structured with 86,737 samples.

To capture the system’s temporal dynamics, lagged variables were constructed for each physical variable, considering windows of up to 24 previous hours at different temporal resolutions (0, 1, 2, 3, 6, 12, and 24h), resulting in 49 additional attributes used as model inputs. The problem was formulated as a supervised regression task with a three-hour forecasting horizon, with models trained separately for each target (Kp and Dst), enabling a direct comparison of performance across different geomagnetic variability regimes.

The experimental evaluation considered three main models: linear regression, random forest, and XGBoost. The data split was performed temporally, with training on the period from 2015 to 2022 (69,361 samples – 80.3%), validation in 2023 (8,544 samples – 9.9%), and testing in 2024 (8,562 samples – 9.9%), ensuring the absence of temporal leakage and simulating a realistic operational forecasting scenario.

The results show consistent performance across models, with progressive improvement as model complexity increases. For the Kp index, R² values ranged from 0.522 for linear regression to 0.556 for XGBoost, while for the Dst index they ranged from 0.572 to 0.639. It was observed that Dst benefited more from nonlinear models, suggesting a more complex relationship between input variables and its dynamics.

Finally, an analysis of feature importance was conducted, showing a predominance of solar wind speed and the Bz component of the interplanetary magnetic field, along with their temporal lags, indicating that both the intensity and persistence of these conditions are key determinants for geomagnetic activity forecasting. The results also suggest that, given the set of variables used, there is a moderate predictability limit, with maximum performance close to an R² of 0.64 for Dst and 0.56 for Kp.

Author

Presentation materials

There are no materials yet.