Author: Pier Felice Balestrucci

HurtNet: a Multilingual Dictionary of Hurtful Words

Post author By Pier Felice Balestrucci
Post date 11/16/2025

Adel M. Wizani will present one of his recent collaborative works: HurtNet, a multilingual dictionary of hurtful terms, their senses, and examples of hurtful usage. The resource spans five languages: Arabic, French, Bulgarian, Greek, and Italian. The talk will outline the creation process for each language and the methods used for validation, focusing on the challenges of modeling hurtful language in Arabic (the presenter’s native language), particularly given its diglossic nature and semantic richness. It will also present results from a knowledge-injection zero-shot classification experiment, which demonstrate improved recall across languages, along with a qualitative analysis of model-generated explanations that reveal cross-linguistic patterns in the expression and detection of hurtful language.

Where: Sala Riunioni (1st floor)
When: 21/11/25 11:30

Meetings

How Irrelevant Contextual Information Can Systematically Influence the Outputs of LLMs

Post author By Pier Felice Balestrucci
Post date 07/01/2025

Samuele D’Avenia will talk about one of his recent works, the abstract of which is reported below.

Several recent works have examined the generations produced by large language models (LLMs) on subjective topics such as political opinions and attitudinal questionnaires, with a particular interest in controlling these outputs to align with specific users or perspectives.

This work investigates how irrelevant contextual information can systematically influence the outputs of Large Language Models (LLMs) when generating opinions on subjective topics. Using the Political Compass Test as a case study, we analyze LLM-generated responses in an open-generation setting and conduct a detailed statistical analysis to quantify shifts in opinions caused by unrelated contextual cues. Our findings reveal that some of these elements can predictably bias model responses, further highlighting challenges in ensuring the robustness and reliability of LLMs when generating opinions on subjective topics

When: 4/07/2025, h 11.30

Where: Sala conferenze – 3rd floor

Meetings

ELIta, an Italian emotion lexicon

Post author By Pier Felice Balestrucci
Post date 06/03/2025

Eliana Di Palma will explore the connection between language and emotions, focusing on how words can convey emotions. A central topic will be ELIta, an Italian emotion lexicon containing over 6,000 words and emojis, manually annotated by native speakers. This resource is designed to support the analysis of emotional content in various types of texts, including both formal writing and everyday communication, such as messages and social media. Practical examples will illustrate how words are linked to emotions, and how these associations may vary depending on factors such as age, gender, and cultural background. The talk will conclude with a study on oxymorons, examining the relationship between their components and their emotional perception from both psycholinguistic and computational perspectives.

When: 6/06/2025, h 11.30

Where: Sala Seminari – 1st floor

Meetings

WebNLG-IT: Construction of an aligned RDF-Italian corpus throughMachine Translation techniques

Post author By Pier Felice Balestrucci
Post date 04/07/2025

Michael Oliverio will talk about one of his work entitled ‘WebNLG-IT: Construction of an aligned RDF-Italian corpus throughMachine Translation techniques’

Abstract

The main goal of this work is the creation of the Italian version of the WebNLG corpus through the application of Neural Machine Translation (NMT) and post-editing with hand-written rules. To achieve this goal, in a first step, several existing NMT models were analysed and compared in order to identify the system with the highest performance on the original corpus. In a second step, after using the best NMT system, we semi-automatically designed and applied a number of rules to refine and improve the quality of the produced resource, creating a new corpus named WebNLG-IT. We used this resource for fine-tuning several LLMs for RDF-to-text tasks. In this way, comparing the performance of LLM-based generators on both Italian and English, we have (1) evaluated the quality of WebNLG-IT with respect to the original English version, (2) released the first fine-tuned LLM-based system for generating Italian from semantic web triples and (3) introduced an Italian version of a modular generation pipeline for RDF-to-text.

When: 11/04/2025, h 11.30

Where: Sala Conferenze – 3rd floor

Meetings

I’m sure you’re a real scholar yourself: Exploring Ironic Content Generation by Large Language Models

Post author By Pier Felice Balestrucci
Post date 10/25/2024

Soda Marem Lo is a PhD student at the beginning of her 3rd year, and she will talk about one of her latest works called “I’m sure you’re a real scholar yourself: Exploring Ironic Content Generation by Large Language Models.”

Join us if you’re interested in learning more about the ability of LLMs to generate ironic content!

Abstract:

Generating ironic content is challenging: it requires a nuanced understanding of context and implicit references and balancing seriousness and playfulness. Moreover, irony is highly subjective and can depend on various factors, such as social, cultural, or generational aspects. This paper explores whether Large Language Models (LLMs) can learn to generate ironic responses to social media posts. To do so, we fine-tune two models to generate ironic and non-ironic content and deeply analyze their outputs’ linguistic characteristics, their connection to the original post, and their similarity to the human-written replies. We also conduct a large-scale human evaluation of the outputs. Additionally, we investigate whether LLMs can learn a form of irony tied to a generational perspective, with mixed results

When: 8/11/2024 11.30

Where: Sala Riunioni (1st floor)

Meetings

Perspective matters: event framing in language & society

Post author By Pier Felice Balestrucci
Post date 03/20/2024

[trigger warning: mentions of gender-based violence]

When talking about societally impactful events, our choices of words and grammatical constructions often reflect our socio-political perspective on these events and affect how the people that we talk to perceive the events. In particular, in events that involve an unequal power relationship between different groups of people, this relationship affects how the agency of the participants in the events is portrayed. Gender-based violence is a particularly relevant example of this: “woman tragically dies in family incident” and “man suspected of killing his wife” could both be factually accurate descriptions of a femicide, but, when used as a newspaper headline, convey very different views of the event. In the lecture, we will discuss ways in which recently developed NLP techniques can help make visible such different ‘framings’ and contribute to increasing societal awareness. The lecture will be followed by a hands-on session in which we will do small-scale experiments together, looking at how to apply and extend these techniques.

Bio:
Gosse Minnema is a computational linguist based in Groningen, The Netherlands. He is currently preparing to defend his PhD thesis on frame semantics applied to media framing. His main area of interest is computational semantics and, in a broad sense, ways of applying it in societally meaningful ways. He is currently also an active member of the project “PeARS: The People’s Search Engine” (https://pearsproject.org/) which aims to promote community-owned, privacy-friendly and sustainable NLP solutions for web search and knowledge management.

When: March 26, h 14
Where: Via Sant’Ottavio 54, Room 3.06

Meetings

Hi Guys or Hi folks? Navigating Gender Bias and Inclusive Language in Translation Technologies.

Post author By Pier Felice Balestrucci
Post date 03/06/2024

Beatrice Savoldi will come to Turin for a seminar / lesson as part of the course of Ethics in NLP (Master degree in Language Technologies and Digital Humanities).
Feel free to join if you are interested in the topic!

Where: Aula 3.06 Thin Client (terzo piano) – Via Sant’Ottavio 54 – Via Sant’Ottavio – 54 – Torino

Title: Hi Guys or Hi folks? Navigating Gender Bias and Inclusive Language in Translation Technologies.

When: 11:00-12:30 March 15, 2024

Abstract: Societal gender asymmetries and inequalities can be embedded in our communication practices and perpetuated in language technologies, including Machine Translation (MT) systems used as scale. In this presentation, we will delve into the current landscape of MT and gender bias, as well as current proposals towards more inclusive language.
By focusing on English-Italian as an exemplar language pair, we will discuss the challenges and opportunities — both theoretical, technical but also linguistic — in fostering a more equitable automatic translation.

Short bio: Beatrice Savoldi is a postdoc researcher at Fondazione Bruno Kessler (FBK) within the MT research unit, where she mainly works
on trustworthy and gender inclusive translation technologies. Beatrice carried out a joint international PhD at the University of Trento
and Augsburg with a dissertation on gender bias in Machine Translation, which was awarded the 2023 best thesis Research Prize
from the Augsburg University Foundation. Her research interests broadly encompass ethical and social considerations of language technologies.

Meetings

DelBERTo: A Deep Lightweight Transformer for Sentiment Analysis

Post author By Pier Felice Balestrucci
Post date 01/18/2024

Where: Sala conferenze (3th floor)
When: January 26, 2024, 11:30

Luca Molinaro is a PhD student from our 39th cycle, and he will present his work titled ‘DelBERTo: A Deep Lightweight Transformer for Sentiment Analysis,’ which has been accepted at AIxIA 2022. Please find the abstract below.

Abstract:
This article introduces DelBERTo, a resource-efficient Transformer architecture for Natural Language Processing (NLP). Transformers replace convolutions and recurrence with the self-attention mechanism and represent the state-of-the-art in NLP. However, self-attention’s complexity grows quadratically with the size of the input, which limits their applications. DelBERTo relies on adaptive input and on a deep yet lightweight Transformer architecture to reduce the number of learnable parameters, and relies on adaptive softmax to improve pre-training speed and memory footprint. We evaluate the proposed architecture in a sentiment analysis task and compare it against AlBERTo, a BERT model representing the state-of-the-art in sentiment analysis over Italian tweets. DelBERTo has only one-seventh of AlBERTo’s learnable parameters, is faster, and requires less memory. Despite this, our experiments show that DelBERTo is competitive with AlBERTo over the three SENTIPOLC sub-tasks proposed at EVALITA 2016: subjectivity classification, polarity classification, and irony detection.

Meetings

Large Acoustic Models: another challenge for the ecological AI world

Post author By Pier Felice Balestrucci
Post date 01/17/2024

Where: Sala conferenze (3th floor)
When: January 31, 2024, 3pm

Francesco Cutugno is associate professor of Natural Language Processing and Interaction Design (within the frame of the Software Engineering courses) at the University of Naples Federico II. From 2013 to 2018 he was the Italian Association of Speech Sciences president. He is presently a member of the board of the Italian Association of Computational Linguistics. Francesco Cutugno directs Urban/Eco an interdepartmental research center devoted to the study of applications of Artificial Intelligence to conversational agents, applications in architecture, and cultural heritage and language.

Abstract:
These days, many nations, many researchers belonging to public institutions, and many research organizations are attempting the aim of building public, freely accessible, Large (Generative) Language Models (LLM). The enterprise also requires the collection of the huge amount of training data normally required to pursue this purpose. Similarly, another fundamental goal of public research should be to provide the scientific community with AI-based Automatic Speech Recognition systems that, equivalently to LLMs, require massive computational load, infrastructures, and thousands of hours of audio data both labeled and unlabeled. In analogy with LLM, we call these systems Large Acoustic Models (LAM). My talk will deal with the leading technologies in this field, will describe the needed data profile, and will propose alternatives to the current approaches aiming at partly simplifying the complexity of the task of transcribing speech. Some conclusive remarks will be devoted to some aspects of explainability hidden in the DNN approach to the problem.

Meetings

Annotators Aren’t Asocial Atoms: Modelling Individual Perspectives and Social Groups

Post author By Pier Felice Balestrucci
Post date 11/23/2023

Where: Sala conferenze (3th floor)
When: 27/11/2023 11:00

Matthias is a PhD student in the Semantic Computing Group at Bielefeld University (Germany) supervised by Philipp Cimiano. He works on systems to improve online discussions with a particular focus on perspectives and human label variation. From April to June 2022 he visited the MilaNLP Lab at Bocconi University (Italy) to work on related problems in modeling sociodemographics and continues to collaborate with the group.

Abstract:

Annotators, like we all, are shaped to some extent by their membership in social groups of various types. Some groups are formed based on socially-relevant categories, like age or gender, others can be more local and temporary. For example, the group of all annotators in the annotation process is just that, a group. If groups have an impact on us, can we include them in our models to better capture individual perspectives?

I will present results from two recent works to provide some tentative answers to this question. In one case we find that groups based on sociodemographics might be too coarse to be informative [1]. In the other we see that it is beneficial to model the annotators of a dataset as a group and in relation to one another [2].

1) https://aclanthology.org/2023.acl-short.88/

2) https://arxiv.org/abs/2311.03153