AstroAI Workshop 2025
Rocco Di Tella
Building Honest Agents Through Introspection: Probe-driven Generation of Confidence Scores
Presenter: Rocco Di Tella
Title: Building Honest Agents Through Introspection: Probe-driven Generation of Confidence Scores
Date/Time: Monday, July 7th, 3:30 - 5:00 PM
Abstract: Large language models (LLMs) are the engine behind agentic AI, providing language processing, planning, and reasoning capabilities. Unfortunately, current LLMs do not directly provide a measure of confidence for the responses they produce. This poses a serious problem for high-risk applications, where only responses that are very likely to be correct should be accepted. We propose to explore supervised approaches for computing confidence measures for answers provided by LLMs. To this end, we will develop models that probe the LLM’s internal representations to predict whether an answer is correct or not, focusing on structured architectures with strong inductive biases to facilitate generalization to unseen tasks. We will train and evaluate our models on a variety of NLP datasets, using proper scoring rules to assess performance of the produced scores.