AstroAI Workshop 2026
Konstantin Malanchev
Scalable multi-modal catalog analysis with LSDB framework and HATS catalog format
Presenter: Konstantin Malanchev (CMU / LINCC Frameworks)
Title: Scalable multi-modal catalog analysis with LSDB framework and HATS catalog format
Date/Time: Thursday, June 18, 11:30 AM - 12:30 PM
Abstract: Large sky surveys such as ZTF, Rubin, Roman, and Euclid are producing unprecedented volumes of astronomical data across imaging, catalogs, time-domain observations, and spectroscopy. In this talk, we present the HATS catalog format and the LSDB framework as a scalable solution for analyzing such multi-modal data.
HATS is a hierarchical, spatially partitioned catalog standard designed for efficient cloud and distributed storage. It enables fast regional access, parallel reads, and scalable handling of very large catalogs. Built on HATS, LSDB provides a dataframe-based interface for distributed querying, filtering, joins, and cross-matching across billions of sources. LSDB also supports scalable analysis of time-domain and spectral datasets, enabling workflows across multiple data modalities. In addition, its streaming capabilities allow efficient sequential access to distributed datasets for large-scale cross-matching and integration with machine-learning pipelines for cross-catalog training.
We will discuss our collaboration with the Multi Modal Universe (MMU) project to transform MMU data into HATS, as well as the growing ecosystem of HATS catalogs available from the Space Telescope Science Institute, IRSA IPAC, and other providers. Together, HATS and LSDB enable interoperable, scalable science for the next generation of astronomical surveys and AI-driven discoveries.