AstroAI Workshop 2026
Anshuman Acharya
SCHARF: ML-based Super-Resolution for Bridging the galaxy-IGM connection at the Epoch of Reionization
Presenter: Anshuman Acharya (University of California Berkeley)
Title: SCHARF: ML-based Super-Resolution for Bridging the galaxy-IGM connection at the Epoch of Reionization
Date/Time: Monday, June 15, 4:00 PM - 5:30 PM
VIRTUAL
Abstract: JWST and the Square Kilometre Array Observatory (SKAO) study the Epoch of Reionization from complementary angles: JWST resolving galaxy populations driving reionization, and SKAO mapping the large-scale topology of the neutral hydrogen field through the 21-cm signal. Realising the full scientific synergy between these facilities demands simulations capable of simultaneously reproducing galactic observables like the UV luminosity function and the statistical properties of the 21-cm power spectrum. This places competing and severe demands on cosmological simulations: large volumes (>100 Mpc) are required to capture the representative large-scale modes that shape the 21-cm signal, yet sufficient resolution to model the ultra-faint dwarf galaxies (which can contribute up to 50% of the ionizing photon budget) remains computationally out of reach in a single brute-force simulation.
Crucially, the observational incompleteness of JWST does not justify equivalent resolution limits in simulations: JWST samples a flux-limited subset of the true galaxy population, whereas the prior space of a simulation encompasses all haloes above its mass resolution limit. Simulations that resolve up to JWST-detectable galaxies, therefore, introduce a systematic prior mismatch that biases any downstream inference. This impacts not just the galactic observables, but also the observations of the IGM topology itself, since undetected faint galaxies still emit ionizing photons and directly shape the neutral hydrogen field probed by SKAO.
I thus present SCHARF (Statistical Cosmic Halo Augmentation using Random Forests), a novel hybrid framework designed to resolve this tension. SCHARF employs a Random Forest model, dynamically calibrated against theoretical halo mass functions, to augment the dark matter halo catalogues of large-volume, low-resolution N-body simulations with physically consistent synthetic mini-halos. To preserve the spatial and kinematic coherence required for accurate 2-point clustering statistics, synthetic halos are anchored using local density gradients and bulk velocity flows from the parent simulation, ensuring the super-resolved catalogues remain self-consistent inputs for downstream galaxy formation and reionization modelling.
I present first results from coupling SCHARF with a 150 Mpc/h cosmological simulation, showing how mass super-resolution propagates through the galaxy formation pipeline: from the UV luminosity function and star formation rate density as compared to JWST observations, to the 21-cm power spectrum accessible to SKAO. Together, these results demonstrate that resolving the faint-end galaxy population is not merely a numerical detail, but has measurable consequences for both galactic and large-scale IGM observables. I close by discussing ongoing challenges in temporal consistency of ML-generated merger trees, and the prospects for emulator-based rescaling of the 21-cm power spectrum from resolution-limited simulations, thus offering a computationally tractable path toward self-consistent, multi-observable predictions for the upcoming era of EoR observations.