Categories
Meetings

WebNLG-IT: Construction of an aligned RDF-Italian corpus throughMachine Translation techniques

Michael Oliverio will talk about one of his work entitled ‘WebNLG-IT: Construction of an aligned RDF-Italian corpus throughMachine Translation techniques’

Abstract

The main goal of this work is the creation of the Italian version of the WebNLG corpus through the application of Neural Machine Translation (NMT) and post-editing with hand-written rules. To achieve this goal, in a first step, several existing NMT models were analysed and compared in order to identify the system with the highest performance on the original corpus. In a second step, after using the best NMT system, we semi-automatically designed and applied a number of rules to refine and improve the quality of the produced resource, creating a new corpus named WebNLG-IT. We used this resource for fine-tuning several LLMs for RDF-to-text tasks. In this way, comparing the performance of LLM-based generators on both Italian and English, we have (1) evaluated the quality of WebNLG-IT with respect to the original English version, (2) released the first fine-tuned LLM-based system for generating Italian from semantic web triples and (3) introduced an Italian version of a modular generation pipeline for RDF-to-text.

When:  11/04/2025, h 11.30

Where:  Sala Conferenza – 3rd floor

Categories
Meetings

Computational Linguistics in Action: From Text Corpora to Real-World Challenges

Manuela Sanguinetti will hold a seminar titled “Computational Linguistics in Action: From Text Corpora to Real-World Challenges.”

Bio

Manuela Sanguinetti received her Ph.D. in Computer Science from the University of Turin in 2016. She is currently a non-tenured assistant professor at the Department of Mathematics and Computer Science, University of Cagliari, where she has been working on a project funded by the National Reform and Resilience Plan (PNRR).
Her work primarily focuses on the development of linguistic resources to enhance language understanding and processing. She has been involved in a wide range of research collaborations regarding the study of task-oriented conversational agents, hate speech and stereotype detection, and multilingualism.

When:  03/04/2025, h 16.00

Where: Sala Riunioni – 1st floor

Online linkhttps://meet.google.com/ztd-mjgc-yjv

Categories
Meetings

NLP meets non-standard languages: Opportunities and ethical responsibilities

Alan Ramponi will talk about one of his projects entitled “NLP meets non-standard languages: Opportunities and ethical responsibilities”.

Abstract

After many years of research focused primarily on standardized languages, the natural language processing (NLP) community has recently begun to include “non-standard” language varieties in its repertoire. This opens new opportunities for research, but it also presents unprecedented challenges and calls for greater ethical responsibilities. In this seminar, I will present recent work in NLP for non-standard languages with a focus on language varieties of Italy, highlighting i) the importance of accounting for linguistic variation and how to explore it, ii) the problematic assumption of considering all language varieties as the same in terms of language functions and technological needs, and iii) the need to actively engage with speech communities when dealing with endangered languages to co-design locally-meaningful artifacts that meet their needs and represent their language varieties.

Bio

Alan Ramponi is a senior researcher in natural language processing (NLP) at Fondazione Bruno Kessler, Italy, where he is part of the Digital Humanities research group. His research focuses on language variation across many dimensions (e.g., non-standard varieties and dialects, domains, registers, social factors). He is interested in how NLP can contribute to the study of language variation, and how accounting for language variation can contribute to more robust, fair, and inclusive NLP. Web page: https://alanramponi.github.io/

When:  28/03/2025, h 11.00

Where: Aula 3.06 Thin Client (terzo piano) – Via Sant’Ottavio ,54

Online linkhttps://meet.google.com/gvw-dfuo-bvt 

Categories
Meetings

Hi Guys or Hi folks? Navigating Gender Bias and Inclusive Language in Translation Technologies

Beatrice Savoldi will present one of her recent work
entitle “Hi Guys or Hi folks? Navigating Gender Bias and Inclusive Language in Translation Technologies”.

Abstract

Societal gender asymmetries and inequalities can be embedded in our communication practices and perpetuated in language technologies, including Machine Translation (MT) systems used as scale. In this presentation, we will delve into the current landscape of MT and gender bias, as well as current proposals towards more inclusive language. 

By focusing on English-Italian as an exemplar language pair, we will discuss the challenges and opportunities — both theoretical, technical but also linguistic —  in fostering a more equitable automatic translation. 

When: 14/03/2025 11.00

Where: Aula 3.06 Thin Client (terzo piano) – Via Sant’Ottavio 54

Categories
Meetings

GAttention: Gated Attention for the Detection of Abusive Language

Horacio Jarquin will present one of his projects
entitle “GAttention: Gated Attention for the Detection of Abusive Language”.

Abstract

Abusive language online creates toxic environments and exacerbates social tensions, underscoring the need for robust NLP models to interpret nuanced linguistic cues. This research introduces GAttention, a novel Gated Attention mechanism that combines the strengths of Contextual attention and Self-attention mechanisms to address the limitations of existing attention models within the text classification task. GAttention capitalizes on local and global query vectors by integrating the internal relationships within a sequence (Self-attention) and the global relationships among distinct sequences (Contextual attention). This combination allows for a more nuanced understanding and processing of sequence elements, which is particularly beneficial in context-sensitive text classification tasks such as the case of abusive language detection. By applying this mechanism to transformer-based encoder models, we showcase how it enhances the model’s ability to discern subtle nuances and contextual clues essential for identifying abusive language, a challenging and increasingly relevant task within NLP.

When: 21/03/2025 11.30

Where: Sala Riunioni (first floor)

Categories
Meetings

Bridging views on the concepts of ‘multilingual’, ‘cross-lingual’ and ‘translingual’ in Language Technology

Adriana Pagano will present one of her international projects
entitle “Bridging views on the concepts of ‘multilingual’, ‘cross-lingual’ and ‘translingual’ in Language Technology”.

Abstract

The presentation will introduce the interdisciplinary research network UniDive (Universality, Diversity, and Idiosyncrasy in Language Technology), a COST Action (European Cooperation in Science and Technology). Adriana will focus on one of the tasks she is currently leading within UniDive’s Working Group 3 – Multilingual and Cross-Lingual Language Technology.

When: 17/01/2025 11.30

Where: Sala Conferenze (3rd floor)

Categories
Meetings

Exploring YouTube Comments Reacting to Femicide News in Italian

Marco Madeddu is a PhD student at the beginning of her 1st year, and he will talk about one of his latest works called “Exploring YouTube Comments Reacting to Femicide News in Italian”

Abstract

In recent years, the Gender Based Violence (GBV) has become an important issue in modern society and a central topic in different research areas due to its alarming spread.
Several Natural Language Processing (NLP) studies, concerning Hate Speech directed against women, have focused on misogynistic behaviours, slurs or incel communities.
The main contribution of our work is the creation of the first dataset on social media comments to GBV, in particular to a femicide event.
Our dataset, named GBV-Maltesi, contains 2,934 YouTube comments annotated following a new schema that we developed in order to study GBV and misogyny with an intersectional approach.
During the experimental phase, we trained models on different corpora for binary misogyny detection and found that datasets that mostly include explicit expressions of misogyny are an easier challenge, compared to more implicit forms of misogyny contained in GBV-Maltesi.

When: 13/12/2024 11.30

Where: Sala Conferenze (3rd floor)

Categories
Meetings

I’m sure you’re a real scholar yourself: Exploring Ironic Content Generation by Large Language Models

Soda Marem Lo is a PhD student at the beginning of her 3rd year, and she will talk about one of her latest works called “I’m sure you’re a real scholar yourself: Exploring Ironic Content Generation by Large Language Models.”

Join us if you’re interested in learning more about the ability of LLMs to generate ironic content!

Abstract:

Generating ironic content is challenging: it requires a nuanced understanding of context and implicit references and balancing seriousness and playfulness. Moreover, irony is highly subjective and can depend on various factors, such as social, cultural, or generational aspects. This paper explores whether Large Language Models (LLMs) can learn to generate ironic responses to social media posts. To do so, we fine-tune two models to generate ironic and non-ironic content and deeply analyze their outputs’ linguistic characteristics, their connection to the original post, and their similarity to the human-written replies. We also conduct a large-scale human evaluation of the outputs. Additionally, we investigate whether LLMs can learn a form of irony tied to a generational perspective, with mixed results

When: 8/11/2024 11.30

Where: Sala Riunioni (1st floor)

Categories
Meetings

Data Augmentation through Back-Translation for Stereotypes and Irony Detection

Tom Bourgeade will present his research on “Data Augmentation through Back-Translation for Stereotypes and Irony Detection”.

Abstract

In NLP, the detection of nuanced phenomena such as stereotypes or irony presents unique challenges, namely linked to the scarcity of labeled datasets. One strategy to mitigate this is to employ Data Augmentation methods, which each have their pros and cons with regard to these phenomena. This presentation will focus on Back-Translation, which proposes exploiting modern Machine Translation models to introduce variety in training instances, in a process similar to paraphrasing, by translating a text into a pivot language, then back into the original language. We compare this approach on multilingual datasets for stereotypes and irony detection, against simpler strategies such as oversampling, as well as Cross-Translation, in which instances from other language subsets are translated and injected into the target training language subset.

When: 19/04/2024 11.30

Where: Sala Conferenze (3rd floor)

Categories
Meetings

Perspective matters: event framing in language & society

[trigger warning: mentions of gender-based violence]

When talking about societally impactful events, our choices of words and grammatical constructions often reflect our socio-political perspective on these events and affect how the people that we talk to perceive the events. In particular, in events that involve an unequal power relationship between different groups of people, this relationship affects how the agency of the participants in the events is portrayed. Gender-based violence is a particularly relevant example of this: “woman tragically dies in family incident” and “man suspected of killing his wife” could both be factually accurate descriptions of a femicide, but, when used as a newspaper headline, convey very different views of the event. In the lecture, we will discuss ways in which recently developed NLP techniques can help make visible such different ‘framings’ and contribute to increasing societal awareness. The lecture will be followed by a hands-on session in which we will do small-scale experiments together, looking at how to apply and extend these techniques.

Bio:
Gosse Minnema is a computational linguist based in Groningen, The Netherlands. He is currently preparing to defend his PhD thesis on frame semantics applied to media framing. His main area of interest is computational semantics and, in a broad sense, ways of applying it in societally meaningful ways. He is currently also an active member of the project “PeARS: The People’s Search Engine” (https://pearsproject.org/) which aims to promote community-owned, privacy-friendly and sustainable NLP solutions for web search and knowledge management.

When: March 26, h 14
Where: Via Sant’Ottavio 54, Room 3.06