Categories
Meetings

Don’t Classify, Rank: Retrieval, Fusion, and Label Semantics for XMTC and MCTC

CCC Seminary by Celso França, Ph.D. student at UFMG, who will present his work, xCoRetriev: A Retrieval-Centric Paradigm for Extreme and Multi-Class Text Classification.

Abstract:
We address Extreme Multi-Label Text Classification (XMTC) and Multi-Class Text Classification (MCTC) under a unified paradigm that reframes classification as a ranking and retrieval problem over large, noisy, and skewed label spaces. In this talk, we synthesize our recent SIGIR 2025 paper and our best paper of SBBD 2025 to demonstrate how retrieval-based formulations can jointly improve scalability, effectiveness, and label semantics across both XMTC and MCTC settings. Our core proposal is xCoRetriev, a dynamic two-stage retrieval and fusion pipeline designed to tackle the main challenges of label space volume, extreme skewness, and label quality by effectively combining dense and sparse representations. We further discuss recent attempts to enhance xCoRetriev’s effectiveness through Dimension Importance Estimation (DIMES) strategies and learned sparse representations trained via masked language modeling (MLM). While these approaches show promise in emphasizing discriminative signals and improving tail-label sensitivity, our analysis highlights their current limitations. Across multiple large-scale datasets, our results demonstrate consistent gains in propensity-scored metrics, improved robustness to noisy and weakly supervised label spaces through RAG-enhanced labels, and strong scalability at both training and inference time. Overall, this work advocates for a retrieval-centric view of large-scale text classification, bridging XMTC and MCTC through ranking, fusion, and importance-aware representations.

When: 02/02/2026, h 14:00
Where: Sala Riunioni, 1st Floor