AstroAI Lunch Talks - November 3, 2025 - Ana Sofía Uzsoy & Mike Smith
03 Nov 2025 - Joshua Wing
The video can be found here: https://www.youtube.com/watch?v=tUH4BJdZcX8
Speaker 1: Ana Sofía Uzsoy (Harvard Astronomy Department)
Title 1: Manifold learning for cosmic structures
Abstract 1: We present a scalable manifold learning approach to represent galaxies in a low-dimensional embedding space based on the geometry of their surrounding structure. We validate this method on a toy dataset consisting of points in balls and lines in space, and demonstrate its utility for astrophysics research on the realistic TNG100 galaxy simulation box. For both datasets, our method effectively captures the local structure around each galaxy. For the TNG100 simulations we show that our first embedding dimension correlates with halo mass and star-formation rate, which aligns with known physical relationships.
Speaker 2: Mike Smith (AstroAI)
Title 2: Text Isn’t All You Need
Abstract 2: There are two main ways to build multimodal models for autoregressive architectures like GPT: late fusion (like LLaVA) where you bolt together separately trained encoders, and early fusion (like MetaAI’s Chameleon) where everything mixes from the start. We will discuss examples of both: AstroLLaVA a LLaMA model that has been finetuned to be able to process astronomical imagery, and AstroPT, a GPT-style model trained on 8.6M galaxies that follows similar scaling laws to language models.Time permitting, we will also discuss whether the architecture even matters. The Platonic Representation Hypothesis suggests different models trained on different data are all converging to the same underlying representation of reality. Astronomy is perfect for testing this since we can observe the same objects through completely different instruments. Using the Multimodal Universe dataset (100TB+ of crossmatched astronomical data), we’re finding that larger models do seem to converge toward shared embedding spaces, regardless of their training data or regime.