AstroAI Workshop 2025
Sebsatian Ratzenboeck
Learning with Gaps: A Domain-Adaptive SBI Framework for Mapping Young Stars from Incomplete, Multi-Survey Data
Presenter: Sebsatian Ratzenboeck
Title: Learning with Gaps: A Domain-Adaptive SBI Framework for Mapping Young Stars from Incomplete, Multi-Survey Data
Date/Time: Monday, July 7th, 2:50 - 3:10 PM
Abstract: Identifying young stellar objects (YSOs) in the solar neighborhood is key to understanding the Galactic baryon cycle. By tracing where and when stars form, disperse, and inject energy into the interstellar medium, we gain insight into the processes that regulate star formation and feedback. Recent advances in 3D dust mapping and the availability of multi-wavelength data from surveys like Spitzer, 2MASS, WISE, Gaia, LAMOST, and APOGEE open the door to building a high-resolution 3D census of young stars. However, effectively using this data is challenging: surveys differ in wavelength coverage, sensitivity, and resolution, most stars are only partially observed, and simulations differ systematically from real observations. Traditional simulation-based inference (SBI) methods are poorly suited to this setting, as they assume complete, noise-homogeneous, and simulation-faithful data.
We present a domain-adaptive, multi-survey SBI framework designed to address these challenges. The model learns a shared latent space between synthetic and real data using survey-specific adapters and modality-specific encoders. It aligns simulations and observations through optimal transport and contrastive learning, and additionally uses cross-survey spectral pairs to improve consistency. Crucially, the architecture supports arbitrary combinations of photometric and spectroscopic inputs, handles missing modalities naturally, and enables inference across highly heterogeneous data regimes.
At its core our method is a transformer-based flow matching model trained to learn the full joint distribution over stellar parameters (e.g. age, distance, extinction, Teff, logg) and observations. This architecture allows for learning complex feature dependencies, fast sampling at inference time, and flexible conditioning and marginalization over any subset of inputs. By unifying simulations and observations in a single probabilistic framework, our approach enables accurate YSO characterization and lays the groundwork for a bias-aware and self-consistent 3D map of recent star formation in the solar neighborhood.