Category: Meetings

Periodic meetings of CCC group

Annotators Aren’t Asocial Atoms: Modelling Individual Perspectives and Social Groups

Post author By Pier Felice Balestrucci
Post date 11/23/2023

Where: Sala conferenze (3th floor)
When: 27/11/2023 11:00

Matthias is a PhD student in the Semantic Computing Group at Bielefeld University (Germany) supervised by Philipp Cimiano. He works on systems to improve online discussions with a particular focus on perspectives and human label variation. From April to June 2022 he visited the MilaNLP Lab at Bocconi University (Italy) to work on related problems in modeling sociodemographics and continues to collaborate with the group.

Abstract:

Annotators, like we all, are shaped to some extent by their membership in social groups of various types. Some groups are formed based on socially-relevant categories, like age or gender, others can be more local and temporary. For example, the group of all annotators in the annotation process is just that, a group. If groups have an impact on us, can we include them in our models to better capture individual perspectives?

I will present results from two recent works to provide some tentative answers to this question. In one case we find that groups based on sociodemographics might be too coarse to be informative [1]. In the other we see that it is beneficial to model the annotators of a dataset as a group and in relation to one another [2].

1) https://aclanthology.org/2023.acl-short.88/

2) https://arxiv.org/abs/2311.03153

Meetings

Ontological Engineering Group (OEG) at Universidad Politécnica de Madrid (UPM) meets CCC at UniTO

Post author By Pier Felice Balestrucci
Post date 11/17/2023

Where: Sala Conferenze Terzo Piano, dipartimento di Informatica
When: Starting from h 15

h.15
Title: NLP and Knowledge Graphs – Carlos Badenes-Olmedo (UPM)
Abstract: Showcase of the results of research and innovation carried out in our research group, where we have combined natural language processing techniques with information based on knowledge graphs. Recent advances will be highlighted, emphasizing how this union has facilitated the exploration of new perspectives and creative solutions for complex data analysis. The contribution of this research will be discussed in the broader context of data science, illustrating how research in this area can move forward.

Bio: Carlos Badenes-Olmedo is an Assistant Professor at the Universidad Politécnica de Madrid (UPM), Spain, and a research member of the Ontological Engineering Group (OEG). His research on advanced techniques for knowledge extraction from unstructured data combines machine learning, natural language processing and Knowledge Graphs. Carlos is also co-founder of the company librairy.eu, a technology-based spin-off that facilitates the exploration of large document corpora

h 15.40
Title: Formalising political speech with ontologies, an approach to exploit the discourse of different parties in different media over time. – Ibai Guillén Pacho (UPM)
Bio: Ibai Guillén Pacho is a PhD candidate in Artificial Intelligence and a member of the Ontology Engineering Group (OEG) at the Universidad Politécnica de Madrid. The same university where he completed his Master’s Degree in Artificial Intelligence (2022). He is a graduate in Computer Engineering and Digital Business Transformation by the University of Deusto (2021). His thesis is currently focused on natural language processing tasks, more specifically in the field of diachronic content analysis. His main areas of interest are content analysis, language models and knowledge representation. ORCID https://orcid.org/0000-0001-7801-8815

h 16.20
Title: O-Dang! The Ontology of Dangerous Speech Messages– Marco Stranisci (Dipinfo, UniTO)
https://aclanthology.org/2022.salld-1.2/

Inside the NLP community there is a considerable amount of language resources created, annotated and released every day with the aim of studying specific linguistic phenomena. Despite a variety of attempts in order to organize such resources has been carried on, a lack of systematic methods and of possible interoperability between resources are still present. Furthermore, when storing linguistic information, still nowadays, the most common practice is the concept of “gold standard”, which is in contrast with recent trends in NLP that aim at stressing the importance of different subjectivities and points of view when training machine learning and deep learning methods. In this paper we present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG) for the collection of linguistic annotated data. O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community. The ontology has also been designed to account for a perspectivist approach, since it provides a model for encoding both gold standard and single-annotator labels in the KG.

Meetings

IDA – a multimodal comparable corpus for exploring extremist dynamics in online interaction

Post author By Andrea Marra
Post date 11/08/2023

Selenia Anastasi (she/her) is a Phd candidate in Digital Humanities at the University of Genoa and a Fellow at the Language Technology Group, Hamburg University. She will be delivering a presentation on an already published work: IDA – a multimodal comparable corpus for exploring extremist dynamics in online interaction.

Link to the publication: Proceedings of the 10th International Conference on CMC and Social Media Corpora for the Humanities 2023 (CMC-2023)

Abstract

Extremist online communities are rapidly growing locally, posing potential threats to European and non-European countries. To gain insight into the dynamics of interaction within these web-based extremist groups, we present IDA, the Incel Data Archive. IDA is a multilingual and multimodal corpus compiled from Incel forums in both Italian and English languages. With its collection of forums, blogs, and websites, the Incelosphere serves as an ideal case study for examining interaction dynamics within extremist online communities from a cross-cultural perspective. Therefore, this work makes a twofold contribution: firstly, it provides an original cross-cultural perspective on the Incel phenomenon, and secondly, it extensively discusses the challenges and opportunities encountered when constructing a multimodal and multilingual corpus from discussion forums. The results of the thematic exploration of the corpus demonstrate not only variations in the discussion topic favoured by each community but also differences in the targets of their hateful content.

When: 24/11/2023 11:30

Where: Sala conferenze (3th floor)

Meetings

Current Challenges in Information Extraction

Post author By Pier Felice Balestrucci
Post date 10/13/2023

Elisa Bassignana (https://elisabassignana.github.io/), formerly a master’s student at the Department of Computer Science and now a PhD student at the IT University of Copenhagen, will be delivering a seminar on the current challenges in information extraction. Please find the abstract below.

Abstract
With the increase of digitized data and extensive access to it, the task of extracting relevant information with respect to a given query has become crucial. The variety of applications of the tasks related to Information Extraction, together with the impossibility of annotating data for every individual setup, require models to be robust to data shifts. In this talk I will present the findings of my PhD project with respect to one of the most important challenges of Information Extraction: The ability of models to perform in unseen scenarios (i.e., unknown text domains and unknown queries). Specifically, I will dive deep into the challenges of cross-domain Relation Extraction.

When: 26/10/2023 15:00

Where: Sala conferenze (3th floor)

Meetings

Avoiding the behavioristic traps with the Minimal Cognitive Grid

Post author By Pier Felice Balestrucci
Post date 06/29/2023

Antonio Lieto will explain how to avoid the behavioristic traps with the Minimal Cognitive Grid. Below there is the abstract of his work Lieto, A. (2021). Cognitive design for artificial minds. Routledge.

“The enormous success of modern AI systems (e.g. in computer vision, natural language processing etc.) has led to the formulation of the hypothesis that
such systems – since are able to obtain human or superhuman level performances in a number of tasks – actually have acquired the underlying competence that we humans possess in order to exhibit the same kind of behavior. This hypothesis, I argue, is however based exclusively on a behavioristic analysis of (some of) the output produced by such systems. And, as such, it is methodologically problematic. In this talk I will show how by using a tool known as Minimal Cognitive Grid (MCD, introduced in Lieto 2021) it is possible to avoid this behavioristic trap and, in addition, to compare and rank, in a non subjective way, different types of artificial systems based on their biological or cognitive plausibility”.

When: 6/07/2023 11:30

Where: Sala conferenze (3th floor)

Meetings

Language and Dialogue: A theoretical introduction

Post author By Pier Felice Balestrucci
Post date 06/06/2023

Elisa Di Nuovo, after introducing the theoretical framework of language and dialogue, will review the approaches used to develop and evaluate dialogue systems, conversational agents, and chatbots, with a focus on task-oriented dialogue systems.

When: 9/06/2023 11:30

Where: Sala conferenze (3th floor)

Meetings

KitchenScrap: Fastening SLR Process Following Kitchenham Framework through Data Mining

Post author By Pier Felice Balestrucci
Post date 05/08/2023

Okky Ibrohim will introduce us KitchenScrap.

As a researcher, we should conduct research that gives an impact on the community, which means what we do should fill the research gap to solve research problems that have not been solved yet by previous works. To find that research gap, we should explore what has been done by the previous works through a systematic review, one of which is by following the Kitchenham framework. In this tutorial, we will discuss how to do a systematic review using the Kitchenham framework, from defining the research question, the boolean query, to the final dimension analysis step. More important, in this tutorial we will practice how to use KitchenScrap (https://github.com/okkyibrohim/kitchenscrap), a Python library that can help us fasten the systematic review process by semi-automatically collect and filter paper metadata following the Kitchenham framework.

When: 12/05/2023 11:30am

Where: Sala riunioni (1st floor)

Meetings

The DEEP Sensorium: a multidimensional approach to sensorydomain labelling

Post author By Pier Felice Balestrucci
Post date 04/11/2023

Simona Corciulo will introduce the DEEP Sensorium (Deep Engaging Experiences and Practices – Sensorium), a multidimensional dataset that combines cognitive and affective features to inform systematic methodologies for augmenting contents and experiences with multi-sensory stimuli.

When: 21/04/2023 11:30am

Where: Sala seminari (1fs floor)

Meetings

Is ChatGPT better than Human Annotators?

Post author By Pier Felice Balestrucci
Post date 03/09/2023

We well discuss on a work called “Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech” published on Association for Computing Machinery.

There will be no formal speakers for this meeting and it is open to everybody’s opinion!

Abstract

Recent studies have alarmed that many online hate speeches are implicit. With its subtle nature, the explainability of the detection of such hateful speech has been a challenging problem. In this work, we examine whether ChatGPT can be used for providing natural language explanations (NLEs) for implicit hateful speech detection. We design our prompt to elicit concise ChatGPT-generated NLEs and conduct user studies to evaluate their qualities by compari- son with human-generated NLEs. We discuss the potential and limitations of ChatGPT in the context of implicit hateful speech research.

When: 10/03/2023 11:30am

Where: Sala riunioni (1fs floor)

Meetings

Semantic Coherence Markers for the Early Diagnosis of the Alzheimer Disease

Post author By Matteo Delsanto
Post date 01/10/2023

Matteo Delsanto will present a work published on Artificial Intelligence in Medicine, under the title Semantic coherence markers: The contribution of perplexity metrics, together with Davide Colla Daniele Radicioni, from the Computer Science Department (University of Turin), and Marco Agosto and Benedetto Vitiello, from the Department of Sciences of Public Health and Pediatrics (University of Turin).

Abstract

Devising automatic tools to assist specialists in the early detection of mental disturbances and psychotic disorders is to date a challenging scientific problem and a practically relevant activity. In this work we explore how language models (that are probability distributions over text sequences) can be employed to analyze language and discriminate between mentally impaired and healthy subjects. We have preliminarily explored whether perplexity can be considered a reliable metrics to characterize an individual’s language. Perplexity was originally conceived as an information-theoretic measure to assess how much a given language model is suited to predict a text sequence or, equivalently, how much a word sequence fits into a specific language model. We carried out an extensive experimentation with healthy subjects, and employed language models as diverse as N-grams – from 2-grams to 5-grams – and GPT-2, a transformer-based language model. Our experiments show that irrespective of the complexity of the employed language model, perplexity scores are stable and sufficiently consistent for analyzing the language of individual subjects, and at the same time sensitive enough to capture differences due to linguistic registers adopted by the same speaker, e.g., in interviews and political rallies. A second array of experiments was designed to investigate whether perplexity scores may be used to discriminate between the transcripts of healthy subjects and subjects suffering from Alzheimer Disease (AD). Our best performing models achieved full accuracy and F-score (1.00 in both precision/specificity and recall/sensitivity) in categorizing subjects from both the AD class, and control subjects. These results suggest that perplexity can be a valuable analytical metrics with potential application to supporting early diagnosis of symptoms of mental disorders.

other links
Data in brief dataset publication
Semantic Coherence Dataset: Speech transcripts

When: 13/01/2023

Where: Sala conferenze at the 3° floor