AstroAI Workshop 2026

Kaley Brauer

How Do We Know What an AI System Is Thinking?

Presenter: Kaley Brauer (Harvard University)

Title: How Do We Know What an AI System Is Thinking?

Date/Time: Tuesday, June 16, 1:30 PM - 2:15 PM

Abstract: As AI systems become more capable, their outputs can look fluent, plausible, and correct while still leaving a basic question unanswered: what internal process produced the answer? In this talk, I will introduce interpretability as a scientific approach to studying AI systems from the inside, asking what information models represent, how it is transformed, and which internal states causally affect their behavior. I will then present a case study from my recent work on hidden computation in large language models. We show that frontier-scale models can perform hidden multi-step reasoning over content-free filler tokens such as dots or counting sequences, with no visible chain-of-thought, but that much of this hidden computation can still be decoded from internal activations. I will use this example to discuss why interpretability matters for AI safety and for scientists who want to understand, rather than merely use, increasingly powerful AI systems.

Kaley Brauer

Biography: Kaley Brauer is an NSF Astronomy and Astrophysics Postdoctoral Fellow at Harvard and an AI Fellow working with the Cambridge Boston Alginment Initiative and Anthropic’s Alignment Science team. She received her PhD in Physics from MIT in 2023. Her astrophysics research uses simulations to study early galaxy formation and chemical enrichment, while her AI alignment research investigates hidden computation in large language models.

-->