CCC Seminar by Alessandro Corona Mendozza, predoc researcher at the center for language Technology (Copenhagen University) and visiting at the University of Turin.
Abstract:
We investigate belief-like representations in decoder-only autoregressive LLMs using linear controlled probes on residual stream activations and single attention heads. Following Herrmann and Levinstein’s (2025) criteria (Accuracy, Use, Coherence, and Uniformity) we find that large models exhibit strong truth sensitivity (Accuracy), and steering activations along probe directions reliably changes downstream behavior (Use). Coherence, measured via calibrated probes and cross-dataset probing, is moderate across models, while training on diverse data yields domain-consistent truth directions (Uniformity). The results are particularly encouraging at the head level and align with some standard philosophical accounts of belief, e.g., minimal functionalism, supporting the view that LLMs can maintain propositional attitudes under such theoretical frameworks.
Short bio:
Alessandro Corona Mendozza is a predoc researcher working at the intersection of LLM interpretability, AI epistemology and philosophy of mind/language. He is currently assisting in research for an eye-tracking project at the center for language technology (Copenhagen University) and University of Turin (visiting).
When: 18/02/2026, h 14:00
Where: Sala Conferenze, 3rd Floor