Aug 17 – 21, 2026
National Institute for Space Research, São José dos Campos, SP, Brazil
America/Sao_Paulo timezone

Solarfall: A Multi-Stage Machine Learning Architecture for Solar Flare Forecasting

Aug 19, 2026, 3:00 PM
20m
Fernando de Mendonça - LIT (National Institute for Space Research, São José dos Campos, SP, Brazil)

Fernando de Mendonça - LIT

National Institute for Space Research, São José dos Campos, SP, Brazil

Av. dos Astronautas, 1758 - Jardim da Granja, São José dos Campos - SP, 12227-010
Oral Machine Learning in Space, Earth & Atmospheric Sciences Oral Contributions

Speaker

Eduardo Ferraz de Campos (Federal Institute of São Paulo)

Description

Forecasting solar flares is essential for mitigating space weather risks that threaten technological infrastructures. This research investigated the most relevant attributes for predicting solar flares by comparing the predictive performance of models based on two different predictive philosophies: an "Effect-to-Effect" approach based on X-ray inertia, and a "Cause-to-Effect" approach focused on magnetic topology of active regions. Additionally, an arbiter Meta Model was proposed to integrate both datasets. To address the high dimensionality and class imbalance characteristic of astrophysical data, an hierarchical cascade architecture named Solarfall was developed. This architecture employs four sequential XGBoost-based specialist classifiers to separate a solar flare signal from systemic noise.
Specifically, Solarfall divides the detection problem into four specialized stages. The initial stages prioritize maximizing recall by separating potential alerts from solar calm and filtering out low-impact noise. Subsequent stages act as mechanisms, progressively isolating medium flares from severe threats, and ultimately distinguishing Class X events from Class M storms. To prevent the propagation of cascading errors and overcome exposure bias, the intermediate and final classifiers underwent conditioned training. They were trained on the residual distribution—containing both true signals and false positives—that survived previous cuts, allowing the algorithm to learn the mathematical signatures of predecessor errors and protect the final decision's rigor.
The study used data spanning from 2010 to 2024, employing a chronological split to test the models on Solar Cycle 25. Results indicated that the multi-stage cascade has potential to reduce noise, filtering the initial 158,906 samples down to 26,428 (17%) at the third stage and to 4,440 (3%) at the fourth and last stage, marking a 97% reduction. However, the X-ray approach proved inadequate for forecasting extreme Class X events, yielding a negligible recall of 0.03 at the final stage (capturing 3% of extreme flares). Furthermore, evaluating the integration of x-ray flux and magnetic topology over the test set refuted the hypothesis of a superior combined model. The unification triggered an effect where X-ray flux acted as a confounding variable that degraded the system's accuracy and corrupted predictions.
Conversely, the architecture relying on magnetograms achieving a robust Precision-Recall Area Under the Curve of 0.462 and a precision of 0.48 for events Classes M and X. The research concludes that physical preconditions of an active region like current density and magnetic free energy, constitute the most relevant indicators for solar flare forecasting, outperforming generalized macroscopic approaches.

Author

Eduardo Ferraz de Campos (Federal Institute of São Paulo)

Co-author

Sergio Luisir Discola Junior (IFSP - Campus São Carlos)

Presentation materials

There are no materials yet.