Categories
Meetings

Data Augmentation through Back-Translation for Stereotypes and Irony Detection

Tom Bourgeade will present his research on “Data Augmentation through Back-Translation for Stereotypes and Irony Detection”.

Abstract

In NLP, the detection of nuanced phenomena such as stereotypes or irony presents unique challenges, namely linked to the scarcity of labeled datasets. One strategy to mitigate this is to employ Data Augmentation methods, which each have their pros and cons with regard to these phenomena. This presentation will focus on Back-Translation, which proposes exploiting modern Machine Translation models to introduce variety in training instances, in a process similar to paraphrasing, by translating a text into a pivot language, then back into the original language. We compare this approach on multilingual datasets for stereotypes and irony detection, against simpler strategies such as oversampling, as well as Cross-Translation, in which instances from other language subsets are translated and injected into the target training language subset.

When: 19/04/2024 11.30

Where: Sala Conferenze (3rd floor)

Categories
Meetings

Perspective matters: event framing in language & society

[trigger warning: mentions of gender-based violence]

When talking about societally impactful events, our choices of words and grammatical constructions often reflect our socio-political perspective on these events and affect how the people that we talk to perceive the events. In particular, in events that involve an unequal power relationship between different groups of people, this relationship affects how the agency of the participants in the events is portrayed. Gender-based violence is a particularly relevant example of this: “woman tragically dies in family incident” and “man suspected of killing his wife” could both be factually accurate descriptions of a femicide, but, when used as a newspaper headline, convey very different views of the event. In the lecture, we will discuss ways in which recently developed NLP techniques can help make visible such different ‘framings’ and contribute to increasing societal awareness. The lecture will be followed by a hands-on session in which we will do small-scale experiments together, looking at how to apply and extend these techniques.

Bio:
Gosse Minnema is a computational linguist based in Groningen, The Netherlands. He is currently preparing to defend his PhD thesis on frame semantics applied to media framing. His main area of interest is computational semantics and, in a broad sense, ways of applying it in societally meaningful ways. He is currently also an active member of the project “PeARS: The People’s Search Engine” (https://pearsproject.org/) which aims to promote community-owned, privacy-friendly and sustainable NLP solutions for web search and knowledge management.

When: March 26, h 14
Where: Via Sant’Ottavio 54, Room 3.06

Categories
Meetings

Hi Guys or Hi folks? Navigating Gender Bias and Inclusive Language in Translation Technologies.

Beatrice Savoldi will come to Turin for a seminar / lesson as part of the course of Ethics in NLP (Master degree in Language Technologies and Digital Humanities).
Feel free to join if you are interested in the topic!

Where: Aula 3.06 Thin Client (terzo piano) – Via Sant’Ottavio 54 – Via Sant’Ottavio – 54 – Torino

TitleHi Guys or Hi folks? Navigating Gender Bias and Inclusive Language in Translation Technologies.

When: 11:00-12:30 March 15, 2024

Abstract:  Societal gender asymmetries and inequalities can be embedded in our communication practices and perpetuated in language technologies, including Machine Translation (MT) systems used as scale. In this presentation, we will delve into the current landscape of MT and gender bias, as well as current proposals towards more inclusive language.
By focusing on English-Italian as an exemplar language pair, we will discuss the challenges and opportunities — both theoretical, technical but also linguistic —  in fostering a more equitable automatic translation.

Short bio: Beatrice Savoldi is a postdoc researcher at Fondazione Bruno Kessler (FBK) within the MT research unit, where she mainly works
on trustworthy and gender inclusive translation technologies. Beatrice carried out a joint international PhD at the University of Trento
and Augsburg with a dissertation on gender bias in Machine Translation, which was awarded the 2023 best thesis Research Prize
from the Augsburg University Foundation. Her research interests broadly encompass ethical and social considerations of language technologies.

Categories
Meetings

Multimodal Strategies for Robot-to-Human Communication

Massimo Donini will present his work “Multimodal Strategies for Robot-to-Human Communication”.

Abstract

Multimodality offers new possibilities in the field of robot-to-human communication. In the proposed approach, the coordinated  and integrated use of multimedia elements with the robot’s speech  plays a very important role in the overall effectiveness of the communicative act. During the research, different multimodal communication strategies have been formalised and implemented.

When: 09/02/2024 11:30

Where: Sala Riunioni (1st floor)

Categories
Meetings

DelBERTo: A Deep Lightweight Transformer for Sentiment Analysis

Where: Sala conferenze (3th floor)
When: January 26, 2024, 11:30

Luca Molinaro is a PhD student from our 39th cycle, and he will present his work titled ‘DelBERTo: A Deep Lightweight Transformer for Sentiment Analysis,’ which has been accepted at AIxIA 2022. Please find the abstract below.

Abstract:  
This article introduces DelBERTo, a resource-efficient Transformer architecture for Natural Language Processing (NLP). Transformers replace convolutions and recurrence with the self-attention mechanism and represent the state-of-the-art in NLP. However, self-attention’s complexity grows quadratically with the size of the input, which limits their applications. DelBERTo relies on adaptive input and on a deep yet lightweight Transformer architecture to reduce the number of learnable parameters, and relies on adaptive softmax to improve pre-training speed and memory footprint. We evaluate the proposed architecture in a sentiment analysis task and compare it against AlBERTo, a BERT model representing the state-of-the-art in sentiment analysis over Italian tweets. DelBERTo has only one-seventh of AlBERTo’s learnable parameters, is faster, and requires less memory. Despite this, our experiments show that DelBERTo is competitive with AlBERTo over the three SENTIPOLC sub-tasks proposed at EVALITA 2016: subjectivity classification, polarity classification, and irony detection.

Categories
Meetings

Large Acoustic Models: another challenge for the ecological AI world

Where: Sala conferenze (3th floor)
When: January 31, 2024, 3pm

Francesco Cutugno is associate professor of Natural Language Processing and Interaction Design (within the frame of the Software Engineering courses) at the University of Naples Federico II. From 2013 to 2018 he was the Italian Association of Speech Sciences president. He is presently a member of the board of the Italian Association of Computational Linguistics. Francesco Cutugno directs Urban/Eco an interdepartmental research center devoted to the study of applications of Artificial Intelligence to conversational agents, applications in architecture, and cultural heritage and language.

Abstract:  
These days, many nations, many researchers belonging to public institutions, and many research organizations are attempting the aim of building public, freely accessible, Large (Generative) Language Models (LLM). The enterprise also requires the collection of the huge amount of training data normally required to pursue this purpose. Similarly, another fundamental goal of public research should be to provide the scientific community with AI-based Automatic Speech Recognition systems that, equivalently to LLMs, require massive computational load, infrastructures, and thousands of hours of audio data both labeled and unlabeled. In analogy with LLM, we call these systems Large Acoustic Models (LAM). My talk will deal with the leading technologies in this field, will describe the needed data profile, and will propose alternatives to the current approaches aiming at partly simplifying the complexity of the task of transcribing speech. Some conclusive remarks will be devoted to some aspects of explainability hidden in the DNN approach to the problem.

Categories
Meetings

Annotators Aren’t Asocial Atoms: Modelling Individual Perspectives and Social Groups

Where: Sala conferenze (3th floor)
When: 27/11/2023 11:00

Matthias is a PhD student in the Semantic Computing Group at Bielefeld University (Germany) supervised by Philipp Cimiano. He works on systems to improve online discussions with a particular focus on perspectives and human label variation. From April to June 2022 he visited the MilaNLP Lab at Bocconi University (Italy) to work on related problems in modeling sociodemographics and continues to collaborate with the group.

Abstract:

Annotators, like we all, are shaped to some extent by their membership in social groups of various types. Some groups are formed based on socially-relevant categories, like age or gender, others can be more local and temporary. For example, the group of all annotators in the annotation process is just that, a group. If groups have an impact on us, can we include them in our models to better capture individual perspectives?

I will present results from two recent works to provide some tentative answers to this question. In one case we find that groups based on sociodemographics might be too coarse to be informative [1]. In the other we see that it is beneficial to model the annotators of a dataset as a group and in relation to one another [2].

1) https://aclanthology.org/2023.acl-short.88/

2) https://arxiv.org/abs/2311.03153

Categories
Meetings

Ontological Engineering Group (OEG) at Universidad Politécnica de Madrid (UPM) meets CCC at UniTO

Where: Sala Conferenze Terzo Piano, dipartimento di Informatica
When: Starting from h 15

h.15
Title: NLP and Knowledge Graphs – Carlos Badenes-Olmedo (UPM)
Abstract: Showcase of the results of research and innovation carried out in our research group, where we have combined natural language processing techniques with information based on knowledge graphs. Recent advances will be highlighted, emphasizing how this union has facilitated the exploration of new perspectives and creative solutions for complex data analysis. The contribution of this research will be discussed in the broader context of data science, illustrating how research in this area can move forward.

Bio: Carlos Badenes-Olmedo is an Assistant Professor at the Universidad Politécnica de Madrid (UPM), Spain, and a research member of the Ontological Engineering Group (OEG). His research on advanced techniques for knowledge extraction from unstructured data combines machine learning, natural language processing and Knowledge Graphs. Carlos is also co-founder of the company librairy.eu, a technology-based spin-off that facilitates the exploration of large document corpora

h 15.40
Title: Formalising political speech with ontologies, an approach to exploit the discourse of different parties in different media over time. – Ibai Guillén Pacho (UPM)
Bio: Ibai Guillén Pacho is a PhD candidate in Artificial Intelligence and a member of the Ontology Engineering Group (OEG) at the Universidad Politécnica de Madrid. The same university where he completed his Master’s Degree in Artificial Intelligence (2022). He is a graduate in Computer Engineering and Digital Business Transformation by the University of Deusto (2021). His thesis is currently focused on natural language processing tasks, more specifically in the field of diachronic content analysis. His main areas of interest are content analysis, language models and knowledge representation. ORCID https://orcid.org/0000-0001-7801-8815

h 16.20
Title: O-Dang! The Ontology of Dangerous Speech Messages– Marco Stranisci (Dipinfo, UniTO)
https://aclanthology.org/2022.salld-1.2/

Inside the NLP community there is a considerable amount of language resources created, annotated and released every day with the aim of studying specific linguistic phenomena. Despite a variety of attempts in order to organize such resources has been carried on, a lack of systematic methods and of possible interoperability between resources are still present. Furthermore, when storing linguistic information, still nowadays, the most common practice is the concept of “gold standard”, which is in contrast with recent trends in NLP that aim at stressing the importance of different subjectivities and points of view when training machine learning and deep learning methods. In this paper we present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG) for the collection of linguistic annotated data. O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community. The ontology has also been designed to account for a perspectivist approach, since it provides a model for encoding both gold standard and single-annotator labels in the KG.

Categories
Meetings

IDA – a multimodal comparable corpus for exploring extremist dynamics in online interaction

Selenia Anastasi (she/her) is a Phd candidate in Digital Humanities at the University of Genoa and a Fellow at the Language Technology Group, Hamburg University. She will be delivering a presentation on an already published work: IDA – a multimodal comparable corpus for exploring extremist dynamics in online interaction.

Link to the publication: Proceedings of the 10th International Conference on CMC and Social Media Corpora for the Humanities 2023 (CMC-2023)

Abstract

Extremist online communities are rapidly growing locally, posing potential threats to European and non-European countries. To gain insight into the dynamics of interaction within these web-based extremist groups, we present IDA, the Incel Data Archive. IDA is a multilingual and multimodal corpus compiled from Incel forums in both Italian and English languages. With its collection of forums, blogs, and websites, the Incelosphere serves as an ideal case study for examining interaction dynamics within extremist online communities from a cross-cultural perspective. Therefore, this work makes a twofold contribution: firstly, it provides an original cross-cultural perspective on the Incel phenomenon, and secondly, it extensively discusses the challenges and opportunities encountered when constructing a multimodal and multilingual corpus from discussion forums. The results of the thematic exploration of the corpus demonstrate not only variations in the discussion topic favoured by each community but also differences in the targets of their hateful content. 

When: 24/11/2023 11:30

Where: Sala conferenze (3th floor)

Categories
Meetings

Current Challenges in Information Extraction

Elisa Bassignana (https://elisabassignana.github.io/), formerly a master’s student at the Department of Computer Science and now a PhD student at the IT University of Copenhagen, will be delivering a seminar on the current challenges in information extraction. Please find the abstract below.

Abstract
With the increase of digitized data and extensive access to it, the task of extracting relevant information with respect to a given query has become crucial. The variety of applications of the tasks related to Information Extraction, together with the impossibility of annotating data for every individual setup, require models to be robust to data shifts. In this talk I will present the findings of my PhD project with respect to one of the most important challenges of Information Extraction: The ability of models to perform in unseen scenarios (i.e., unknown text domains and unknown queries). Specifically, I will dive deep into the challenges of cross-domain Relation Extraction.

When: 26/10/2023 15:00

Where: Sala conferenze (3th floor)