Categories
Meetings

HurtNet: a Multilingual Dictionary of Hurtful Words

Adel M. Wizani will present one of his recent collaborative works: HurtNet, a multilingual dictionary of hurtful terms, their senses, and examples of hurtful usage. The resource spans five languages: Arabic, French, Bulgarian, Greek, and Italian. The talk will outline the creation process for each language and the methods used for validation, focusing on the challenges of modeling hurtful language in Arabic (the presenter’s native language), particularly given its diglossic nature and semantic richness. It will also present results from a knowledge-injection zero-shot classification experiment, which demonstrate improved recall across languages, along with a qualitative analysis of model-generated explanations that reveal cross-linguistic patterns in the expression and detection of hurtful language.

Where: Sala Riunioni (1st floor)
When: 21/11/25 11:30

Categories
Meetings

Less than Meets the Eye: Representing Compounds in Large Language Models 

CCC Seminar Prof. Aline Villavicencio, Director of the Institute of Data Science and Artificial Intelligence, University of Exter, UK

Abstract

Large language models have been successfully used for capturing distinct (and very specific) word usages, and therefore could provide an attractive alternative for accurately determining meaning in language. However, these models still face a serious challenge when dealing with non-literal language, like that involved in Multiword Expressions (MWEs) such as idioms (make ends meet), light verb constructions (give a sigh), verb particle constructions (shake up) and noun compounds (loan shark). MWEs are an integral part of the mental lexicon of native speakers often used to express complex ideas in a simple and conventionalised way accepted by a given linguistic community. Although they may display a wealth of idiosyncrasies, from lexical, syntactic and semantic to statistical, that represents a real challenge for current LLMs, their accurate integration has the potential for improving the precision, naturalness and fluency of many tasks. In this talk, I will present an overview of how advances in LLMs have made an impact for the identification and modelling of idiomaticity and MWEs. I will concentrate on what models seem to incorporate of idiomaticity, as idiomatic interpretation may require knowledge that goes beyond what can be gathered from the individual words of an expression (e.g. “dark horse” as an unknown candidate who unexpectedly succeeds). I will also present an initiative to construct a multilingual idiomatic dataset.

Short Bio

Aline Villavicencio is the Director of the Institute of Data Science and Artificial Intelligence, University of Exeter, affiliated to the  Department of Computer Science, University of Sheffield (UK), is a member of ELLIS and has a Fellowship at the Alan Turing Institute. Before these, she held academic positions in the Institute of Informatics, Federal University of Rio Grande do Sul, Brazil (between 2005 and 2021) and in the School of Computer Science and Electronic Engineering, University of Essex, UK. She received her PhD from the University of Cambridge (UK) in 2001, and held postdoc positions at the University of Cambridge and University of Essex (UK). She was a Visiting Scholar at the Massachusetts Institute of Technology (USA, 2011-2012 and 2014-2015), at the École Normale Supé­rieure (France, 2014), an Erasmus-Mundus Visting Scholar at Saarland University (Germany in 2012/2013) and at the University of Bath (UK, 2006-2009).  She held a Research Fellowship from the  Brazilian National Council for Scientific and Technological Development (Brazil, 2009-2017). She is a member of the editorial board of Computational Linguistics, TACL and of JNLE. She is the General Chair of EACL 2026 and was the PC Co-Chair of ACL 2022, CoNLL 2019, Senior Area Chair for EMNLP 2025, ACL 2020 and ACL 2019 among others and General co-chair for the 2018 International Conference on Computational Processing of Portuguese. She was also a member of the NAACL board, SIGLEX board and of the program committees of various *ACL and AI conferences, and has co-chaired  several *ACL workshops on Cognitive Aspects of Computational Language Acquisition and on Multiword Expressions. Her research interests include lexical semantics, multilinguality, multiword expressions and cognitively motivated NLP, and has co-edited special issues and books dedicated to these topics.

When:  13/11/25 , h 11.00

Where:  Sala Conferenze – 3rd floor

Categories
Meetings

Bootstrapping UMRs from UD for Scalable Multilingual Annotation

CCC Seminar by Federica Gamba – PhD student in Computational Linguistics, Charles University, Czech Republic

Abstract

Uniform Meaning Representation (UMR) offers a cross-linguistically applicable framework for capturing sentence- and document-level semantics, but producing UMR annotations from scratch is a time-intensive process. In this talk, I will present an approach for bootstrapping UMR graphs by leveraging Universal Dependencies (UD), a richly annotated multilingual syntactic resource covering a wide range of language families. I will describe how structural correspondences between UD and UMR can be exploited to automatically derive partial UMR graphs from UD trees, providing annotators with an initial representation to refine rather than create from scratch. While UD is not inherently semantic, it encodes syntactic information that maps well onto UMR structures, allowing us to extract meaningful correspondences that simplify annotation. This method not only reduces annotation effort but also facilitates scalable UMR creation across typologically diverse languages, aligning with UMR’s cross-linguistic design goals.

When:  12/11/25 , h 9.00

Where:  Sala Conferenze – 3rd floor

Categories
Meetings

How Irrelevant Contextual Information Can Systematically Influence the Outputs of LLMs


Samuele D’Avenia will talk about one of his recent works, the abstract of which is reported below.

Several recent works have examined the generations produced by large language models (LLMs) on subjective topics such as political opinions and attitudinal questionnaires, with a particular interest in controlling these outputs to align with specific users or perspectives.

This work investigates how irrelevant contextual information can systematically influence the outputs of Large Language Models (LLMs) when generating opinions on subjective topics. Using the Political Compass Test as a case study, we analyze LLM-generated responses in an open-generation setting and conduct a detailed statistical analysis to quantify shifts in opinions caused by unrelated contextual cues. Our findings reveal that some of these elements can predictably bias model responses, further highlighting challenges in ensuring the robustness and reliability of LLMs when generating opinions on subjective topics

When:  4/07/2025, h 11.30

Where:  Sala conferenze – 3rd floor

Categories
Meetings

ELIta, an Italian emotion lexicon

Eliana Di Palma will explore the connection between language and emotions, focusing on how words can convey emotions. A central topic will be ELIta, an Italian emotion lexicon containing over 6,000 words and emojis, manually annotated by native speakers. This resource is designed to support the analysis of emotional content in various types of texts, including both formal writing and everyday communication, such as messages and social media. Practical examples will illustrate how words are linked to emotions, and how these associations may vary depending on factors such as age, gender, and cultural background. The talk will conclude with a study on oxymorons, examining the relationship between their components and their emotional perception from both psycholinguistic and computational perspectives.

When:  6/06/2025, h 11.30

Where:  Sala Seminari – 1st floor

Categories
Meetings

Emotion Recognition in NLP: Recent Resources, Practices, and Challenges

CCC Seminar by Anna Koufakou, visiting professor from Florida Gulf University, United States

Abstract

The evolution of Natural Language Processing (NLP) technologies, with the advent of Large Language Models (LLMs) and Generative AI, has significantly expanded the scope and effectiveness of NLP-driven tools across applications. Among these, Automated Emotion Recognition (AER) represents a particularly challenging yet promising area of research. Unlike Sentiment Analysis, which usually categorizes sentiment as positive or negative, AER aims to identify specific emotional states, such as anger, joy, or sadness, that can vary widely in expression and meaning. This talk explores the current landscape of relevant text corpora and resources, including our recent efforts towards a unifying benchmark. We will also discuss recent practices and key challenges in this evolving field.

When:  29/05/2025, h 10.00

Where:  Sala Conferenze – 3rd floor

Categories
Meetings

WebNLG-IT: Construction of an aligned RDF-Italian corpus throughMachine Translation techniques

Michael Oliverio will talk about one of his work entitled ‘WebNLG-IT: Construction of an aligned RDF-Italian corpus throughMachine Translation techniques’

Abstract

The main goal of this work is the creation of the Italian version of the WebNLG corpus through the application of Neural Machine Translation (NMT) and post-editing with hand-written rules. To achieve this goal, in a first step, several existing NMT models were analysed and compared in order to identify the system with the highest performance on the original corpus. In a second step, after using the best NMT system, we semi-automatically designed and applied a number of rules to refine and improve the quality of the produced resource, creating a new corpus named WebNLG-IT. We used this resource for fine-tuning several LLMs for RDF-to-text tasks. In this way, comparing the performance of LLM-based generators on both Italian and English, we have (1) evaluated the quality of WebNLG-IT with respect to the original English version, (2) released the first fine-tuned LLM-based system for generating Italian from semantic web triples and (3) introduced an Italian version of a modular generation pipeline for RDF-to-text.

When:  11/04/2025, h 11.30

Where:  Sala Conferenze – 3rd floor

Categories
Meetings

Computational Linguistics in Action: From Text Corpora to Real-World Challenges

Manuela Sanguinetti will hold a seminar titled “Computational Linguistics in Action: From Text Corpora to Real-World Challenges.”

Bio

Manuela Sanguinetti received her Ph.D. in Computer Science from the University of Turin in 2016. She is currently a non-tenured assistant professor at the Department of Mathematics and Computer Science, University of Cagliari, where she has been working on a project funded by the National Reform and Resilience Plan (PNRR).
Her work primarily focuses on the development of linguistic resources to enhance language understanding and processing. She has been involved in a wide range of research collaborations regarding the study of task-oriented conversational agents, hate speech and stereotype detection, and multilingualism.

When:  03/04/2025, h 16.00

Where: Sala Riunioni – 1st floor

Online linkhttps://meet.google.com/ztd-mjgc-yjv

Categories
Meetings

NLP meets non-standard languages: Opportunities and ethical responsibilities

Alan Ramponi will talk about one of his projects entitled “NLP meets non-standard languages: Opportunities and ethical responsibilities”.

Abstract

After many years of research focused primarily on standardized languages, the natural language processing (NLP) community has recently begun to include “non-standard” language varieties in its repertoire. This opens new opportunities for research, but it also presents unprecedented challenges and calls for greater ethical responsibilities. In this seminar, I will present recent work in NLP for non-standard languages with a focus on language varieties of Italy, highlighting i) the importance of accounting for linguistic variation and how to explore it, ii) the problematic assumption of considering all language varieties as the same in terms of language functions and technological needs, and iii) the need to actively engage with speech communities when dealing with endangered languages to co-design locally-meaningful artifacts that meet their needs and represent their language varieties.

Bio

Alan Ramponi is a senior researcher in natural language processing (NLP) at Fondazione Bruno Kessler, Italy, where he is part of the Digital Humanities research group. His research focuses on language variation across many dimensions (e.g., non-standard varieties and dialects, domains, registers, social factors). He is interested in how NLP can contribute to the study of language variation, and how accounting for language variation can contribute to more robust, fair, and inclusive NLP. Web page: https://alanramponi.github.io/

When:  28/03/2025, h 11.00

Where: Aula 3.06 Thin Client (terzo piano) – Via Sant’Ottavio ,54

Online linkhttps://meet.google.com/gvw-dfuo-bvt 

Categories
Meetings

Hi Guys or Hi folks? Navigating Gender Bias and Inclusive Language in Translation Technologies

Beatrice Savoldi will present one of her recent work
entitle “Hi Guys or Hi folks? Navigating Gender Bias and Inclusive Language in Translation Technologies”.

Abstract

Societal gender asymmetries and inequalities can be embedded in our communication practices and perpetuated in language technologies, including Machine Translation (MT) systems used as scale. In this presentation, we will delve into the current landscape of MT and gender bias, as well as current proposals towards more inclusive language. 

By focusing on English-Italian as an exemplar language pair, we will discuss the challenges and opportunities — both theoretical, technical but also linguistic —  in fostering a more equitable automatic translation. 

When: 14/03/2025 11.00

Where: Aula 3.06 Thin Client (terzo piano) – Via Sant’Ottavio 54