Categories
Meetings

More with Less – Sustainable AI Approaches for Natural Language Processing  and Introduction to the  Brazilian National Institute for Responsible AI (INCT  TILDIAR)

CCC Seminary by Marcos Gonçalves, Full Professor of Computer Science at the Federal University of Minas Gerais (UFMG), who will present his work INCT TILDIAR: Data-Centric and Sustainable Paths Beyond the ‘Law of More’ in NLP.

Abstract:
This talk introduces the INCT TILDIAR, a national Brazilian research network dedicated to responsible and sustainable Artificial Intelligence, and presents the main research directions developed within the institute. We illustrate how our work challenges the prevailing “Law of More” in NLP by emphasizing data-centric and efficiency-driven approachesto AI.

As a concrete example, the talk briefly summarizes research on instance selection and data engineering, showing that substantial reductions in training data and energy consumption can be achieved while maintaining model effectiveness. The overarching message is that sustainable NLP is possible by rethinking how data is selected and used, rather than relying solely on ever-larger models.

Short bio:
Marcos André Gonçalves is a Full Professor of Computer Science at the Federal University of Minas Gerais (UFMG). He holds a PhD in Computer Science from Virginia Tech, with prior degrees from UFC and UNICAMP, and has completed postdoctoral research at UFMGand Politecnico di Torino. His research focuses on Information Retrieval, Machine Learning, and Natural Language Processing, with over 400 peer-reviewed publications, an h-index of 61, and more than 15,000 citations.

He has received multiple national awards, including CAPES Thesis Awards (2024 – Advisor; 2020 – Honorable Mention), and several Best Paper awards. Prof. Gonçalves has served as General Chair of ACM/IEEE JCDL 2018, is a Senior Program Committee member of leading conferences (SIGIR, ACL, CIKM, WSDM, RecSys, ECIR), and serves on the editorial boards of TACL and the Journal of the Brazilian Computer Society. He is also the Coordinator of the Brazilian National Institute for Responsible AI in NLP (INCT TILDIAR).


When: 02/02/2026, h 15:00
Where: Sala Riunioni, 1st Floor

Categories
Meetings

Don’t Classify, Rank: Retrieval, Fusion, and Label Semantics for XMTC and MCTC

CCC Seminary by Celso França, Ph.D. student at UFMG, who will present his work, xCoRetriev: A Retrieval-Centric Paradigm for Extreme and Multi-Class Text Classification.

Abstract:
We address Extreme Multi-Label Text Classification (XMTC) and Multi-Class Text Classification (MCTC) under a unified paradigm that reframes classification as a ranking and retrieval problem over large, noisy, and skewed label spaces. In this talk, we synthesize our recent SIGIR 2025 paper and our best paper of SBBD 2025 to demonstrate how retrieval-based formulations can jointly improve scalability, effectiveness, and label semantics across both XMTC and MCTC settings. Our core proposal is xCoRetriev, a dynamic two-stage retrieval and fusion pipeline designed to tackle the main challenges of label space volume, extreme skewness, and label quality by effectively combining dense and sparse representations. We further discuss recent attempts to enhance xCoRetriev’s effectiveness through Dimension Importance Estimation (DIMES) strategies and learned sparse representations trained via masked language modeling (MLM). While these approaches show promise in emphasizing discriminative signals and improving tail-label sensitivity, our analysis highlights their current limitations. Across multiple large-scale datasets, our results demonstrate consistent gains in propensity-scored metrics, improved robustness to noisy and weakly supervised label spaces through RAG-enhanced labels, and strong scalability at both training and inference time. Overall, this work advocates for a retrieval-centric view of large-scale text classification, bridging XMTC and MCTC through ranking, fusion, and importance-aware representations.

When: 02/02/2026, h 14:00
Where: Sala Riunioni, 1st Floor