Month: February 2026

The Duality of Social Media Discourse: Characterizing Polluted and Supportive Online Behaviors

Post author By Samuele D'Avenia
Post date 02/27/2026

CCC Seminar by Virginia Morini, Postdoc @CS Department at the University of Pisa (Italy).

Abstract:
Social media platforms have drastically changed how people interact, share information, and form relationships online, generating massive amounts of behavioral data. In this talk, I will present research examining how homophilic mechanisms – the tendency to interact with similar others – can produce radically different outcomes in online spaces. Through data-driven case studies on Reddit and X/Twitter, I employ a multidisciplinary approach combining network science and natural language processing with psychosociological insights to investigate both potentially harmful environments where cognitive biases are exacerbated and beneficial environments where users provide mutual support. My research characterizes the emergent phenomena in these contrasting spaces, examines the underlying user behaviors and group dynamics, and measures their effects across different platforms. The results demonstrate how online spaces can simultaneously foster problematic phenomena like echo chambers in sociopolitical discussions, while enabling supportive communities around mental health issues. I will highlight how community norms and interaction patterns, rather than platform architecture alone, play a crucial role in determining these divergent outcomes. The presentation will also introduce practical, open-source tools for studying online social phenomena while ensuring reproducibility and privacy protection.

When: 10/03/2026, h11:00
Where: Sala Conferenze, 3rd Floor

Meetings

Do LLMs Capture Embodied Cognition and Cultural Variation? Cross-Linguistic Evidence from Demonstratives

Post author By Samuele D'Avenia
Post date 02/27/2026

CCC Seminar by Janet Yu Wang, PhD, visiting for 3 months from Polytechnic University Hong Kong.

Abstract
Do large language models (LLMs) truly acquire embodied cognition and cultural conventions from text? We introduce demonstratives, fundamental spatial expressions like “this/that” in English and “这/那” in Chinese, as a novel probe for grounded knowledge. Using 6,400 responses from 320 native speakers, we establish a human baseline: English speakers reliably distinguish proximal–distal referents but struggle with perspective-taking, while Chinese speakers switch perspectives fluently but tolerate distal ambiguity. In contrast, five state-of-the-art LLMs fail to inherently understand the proximal–distal contrast and show no cultural differences, defaulting to English-centric reasoning. Our study contributes (i) demonstratives as a new lens for evaluating embodied cognition and cultural conventions, (ii) empirical evidence of cross-cultural asymmetries in human interpretation, (iii) a new perspective on the egocentric–sociocentric debate, showing both orientations coexist but vary across languages, and (iv) a call to address individual variation in future model design.

When 03/03/2026, h14:00
Where Sala Conferenze, 3rd Floor

Meetings

LLM Beliefs Are in Their Heads

Post author By Samuele D'Avenia
Post date 02/11/2026

CCC Seminar by Alessandro Corona Mendozza, predoc researcher at the center for language Technology (Copenhagen University) and visiting at the University of Turin.

Abstract:
We investigate belief-like representations in decoder-only autoregressive LLMs using linear controlled probes on residual stream activations and single attention heads. Following Herrmann and Levinstein’s (2025) criteria (Accuracy, Use, Coherence, and Uniformity) we find that large models exhibit strong truth sensitivity (Accuracy), and steering activations along probe directions reliably changes downstream behavior (Use). Coherence, measured via calibrated probes and cross-dataset probing, is moderate across models, while training on diverse data yields domain-consistent truth directions (Uniformity). The results are particularly encouraging at the head level and align with some standard philosophical accounts of belief, e.g., minimal functionalism, supporting the view that LLMs can maintain propositional attitudes under such theoretical frameworks.

Short bio:
Alessandro Corona Mendozza is a predoc researcher working at the intersection of LLM interpretability, AI epistemology and philosophy of mind/language. He is currently assisting in research for an eye-tracking project at the center for language technology (Copenhagen University) and University of Turin (visiting).

When: 18/02/2026, h 14:00
Where: Sala Conferenze, 3rd Floor

Meetings

Evaluation Under Variation: References, Annotators, and Languages

Post author By Samuele D'Avenia
Post date 02/10/2026

CCC Seminar by Silvia Casola, postdoc researcher at the MaiNLP group of the Ludwig Maximilian University of Munich and Munich Center for Machine Learning.

Abstract:
Automatic evaluation in NLP often assumes a single ground truth, such as a reference or a gold label. However, language is inherently variable: multiple outputs can be valid, annotators frequently disagree, and metric behaviours can differ across languages. In this talk, I will present three case studies showing how evaluation can fail and how it can be improved under such variation. Focusing on NLG, I will show that metrics can be highly sensitive to the choice of reference, leading to large changes in system rankings. I will then examine classification evaluation under annotator disagreement and present an approach for accounting for systematic disagreement. Finally, I will discuss recent work on steering multilingual neural metrics to improve their correlation with humans.
Starting from these failure modes, the talk shows how studying and modeling variation in references, annotations, and languages can improve the stability and reliability of automatic evaluation.

Short Bio:
I am a postdoctoral researcher in the MaiNLP group at Ludwig Maximilian University of Munich and the Munich Center for Machine Learning, supervised by Barbara Plank. I was recently awarded a Marie Skłodowska-Curie Postdoctoral Fellowship for my project GenEval, to be hosted at Universitat Pompeu Fabra (UPF), which investigates the relationship between generation and evaluation in Large Language Models. Previously, I was a postdoctoral researcher at the University of Turin, where I worked on perspective-aware NLP. I completed my PhD at the University of Padua and Fondazione Bruno Kessler, focusing on natural language generation. During my PhD, I was a visiting researcher at UPF and interned with Spotify and Huawei Research. My research interests lie in NLP, with a focus on natural language generation and evaluation.

When: 31/03/2026, h 15:00-16:00
Where: Sala Conferenze, 3rd Floor

Meetings

More with Less – Sustainable AI Approaches for Natural Language Processing and Introduction to the Brazilian National Institute for Responsible AI (INCT TILDIAR)

Post author By Samuele D'Avenia
Post date 02/03/2026

CCC Seminary by Marcos Gonçalves, Full Professor of Computer Science at the Federal University of Minas Gerais (UFMG), who will present his work INCT TILDIAR: Data-Centric and Sustainable Paths Beyond the ‘Law of More’ in NLP.

Abstract:
This talk introduces the INCT TILDIAR, a national Brazilian research network dedicated to responsible and sustainable Artificial Intelligence, and presents the main research directions developed within the institute. We illustrate how our work challenges the prevailing “Law of More” in NLP by emphasizing data-centric and efficiency-driven approachesto AI.

As a concrete example, the talk briefly summarizes research on instance selection and data engineering, showing that substantial reductions in training data and energy consumption can be achieved while maintaining model effectiveness. The overarching message is that sustainable NLP is possible by rethinking how data is selected and used, rather than relying solely on ever-larger models.

Short bio:
Marcos André Gonçalves is a Full Professor of Computer Science at the Federal University of Minas Gerais (UFMG). He holds a PhD in Computer Science from Virginia Tech, with prior degrees from UFC and UNICAMP, and has completed postdoctoral research at UFMGand Politecnico di Torino. His research focuses on Information Retrieval, Machine Learning, and Natural Language Processing, with over 400 peer-reviewed publications, an h-index of 61, and more than 15,000 citations.

He has received multiple national awards, including CAPES Thesis Awards (2024 – Advisor; 2020 – Honorable Mention), and several Best Paper awards. Prof. Gonçalves has served as General Chair of ACM/IEEE JCDL 2018, is a Senior Program Committee member of leading conferences (SIGIR, ACL, CIKM, WSDM, RecSys, ECIR), and serves on the editorial boards of TACL and the Journal of the Brazilian Computer Society. He is also the Coordinator of the Brazilian National Institute for Responsible AI in NLP (INCT TILDIAR).

When: 02/02/2026, h 15:00
Where: Sala Riunioni, 1st Floor

Meetings

Don’t Classify, Rank: Retrieval, Fusion, and Label Semantics for XMTC and MCTC

Post author By Samuele D'Avenia
Post date 02/03/2026

CCC Seminary by Celso França, Ph.D. student at UFMG, who will present his work, xCoRetriev: A Retrieval-Centric Paradigm for Extreme and Multi-Class Text Classification.

Abstract:
We address Extreme Multi-Label Text Classification (XMTC) and Multi-Class Text Classification (MCTC) under a unified paradigm that reframes classification as a ranking and retrieval problem over large, noisy, and skewed label spaces. In this talk, we synthesize our recent SIGIR 2025 paper and our best paper of SBBD 2025 to demonstrate how retrieval-based formulations can jointly improve scalability, effectiveness, and label semantics across both XMTC and MCTC settings. Our core proposal is xCoRetriev, a dynamic two-stage retrieval and fusion pipeline designed to tackle the main challenges of label space volume, extreme skewness, and label quality by effectively combining dense and sparse representations. We further discuss recent attempts to enhance xCoRetriev’s effectiveness through Dimension Importance Estimation (DIMES) strategies and learned sparse representations trained via masked language modeling (MLM). While these approaches show promise in emphasizing discriminative signals and improving tail-label sensitivity, our analysis highlights their current limitations. Across multiple large-scale datasets, our results demonstrate consistent gains in propensity-scored metrics, improved robustness to noisy and weakly supervised label spaces through RAG-enhanced labels, and strong scalability at both training and inference time. Overall, this work advocates for a retrieval-centric view of large-scale text classification, bridging XMTC and MCTC through ranking, fusion, and importance-aware representations.

When: 02/02/2026, h 14:00
Where: Sala Riunioni, 1st Floor