Categories
Meetings

Current Challenges in Information Extraction

Elisa Bassignana (https://elisabassignana.github.io/), formerly a master’s student at the Department of Computer Science and now a PhD student at the IT University of Copenhagen, will be delivering a seminar on the current challenges in information extraction. Please find the abstract below.

Abstract
With the increase of digitized data and extensive access to it, the task of extracting relevant information with respect to a given query has become crucial. The variety of applications of the tasks related to Information Extraction, together with the impossibility of annotating data for every individual setup, require models to be robust to data shifts. In this talk I will present the findings of my PhD project with respect to one of the most important challenges of Information Extraction: The ability of models to perform in unseen scenarios (i.e., unknown text domains and unknown queries). Specifically, I will dive deep into the challenges of cross-domain Relation Extraction.

When: 26/10/2023 15:00

Where: Sala conferenze (3th floor)

Categories
Meetings

Avoiding the behavioristic traps with the Minimal Cognitive Grid

Antonio Lieto will explain how to avoid the behavioristic traps with the Minimal Cognitive Grid. Below there is the abstract of his work Lieto, A. (2021). Cognitive design for artificial minds. Routledge.

“The enormous success of modern AI systems (e.g. in computer vision, natural language processing etc.) has led to the formulation of the hypothesis that
such systems – since are able to obtain human or superhuman level performances in a number of tasks – actually have acquired the underlying competence that we humans possess in order to exhibit the same kind of behavior. This hypothesis, I argue, is however based exclusively on a behavioristic analysis of (some of) the output produced by such systems. And, as such, it is methodologically problematic. In this talk I will show how by using a tool known as Minimal Cognitive Grid (MCD, introduced in Lieto 2021)  it is possible to avoid this behavioristic trap and, in addition, to compare and rank, in a non subjective way, different types of artificial systems based on their biological or cognitive plausibility”.

When: 6/07/2023 11:30

Where: Sala conferenze (3th floor)

Categories
Meetings

Language and Dialogue: A theoretical introduction

Elisa Di Nuovo, after introducing the theoretical framework of language and dialogue, will review the approaches used to develop and evaluate dialogue systems, conversational agents, and chatbots, with a focus on task-oriented dialogue systems.

When: 9/06/2023 11:30

Where: Sala conferenze (3th floor)

Categories
Meetings

KitchenScrap: Fastening SLR Process Following Kitchenham Framework through Data Mining

Okky Ibrohim will introduce us KitchenScrap.

As a researcher, we should conduct research that gives an impact on the community, which means what we do should fill the research gap to solve research problems that have not been solved yet by previous works. To find that research gap, we should explore what has been done by the previous works through a systematic review, one of which is by following the Kitchenham framework. In this tutorial, we will discuss how to do a systematic review using the Kitchenham framework, from defining the research question, the boolean query, to the final dimension analysis step. More important, in this tutorial we will practice how to use KitchenScrap (https://github.com/okkyibrohim/kitchenscrap), a Python library that can help us fasten the systematic review process by semi-automatically collect and filter paper metadata following the Kitchenham framework.

When: 12/05/2023 11:30am

Where: Sala riunioni (1st floor)

Categories
Meetings

The DEEP Sensorium: a multidimensional approach to sensorydomain labelling

Simona Corciulo will introduce the DEEP Sensorium (Deep Engaging Experiences and Practices – Sensorium), a multidimensional dataset that combines cognitive and affective features to inform systematic methodologies for augmenting contents and experiences with multi-sensory stimuli.

When: 21/04/2023 11:30am

Where: Sala seminari (1fs floor)

Categories
Meetings

Is ChatGPT better than Human Annotators?

We well discuss on a work called “Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech” published on Association for Computing Machinery.

There will be no formal speakers for this meeting and it is open to everybody’s opinion!

Abstract

Recent studies have alarmed that many online hate speeches are implicit. With its subtle nature, the explainability of the detection of such hateful speech has been a challenging problem. In this work, we examine whether ChatGPT can be used for providing natural language explanations (NLEs) for implicit hateful speech detection. We design our prompt to elicit concise ChatGPT-generated NLEs and conduct user studies to evaluate their qualities by compari- son with human-generated NLEs. We discuss the potential and limitations of ChatGPT in the context of implicit hateful speech research.

When: 10/03/2023 11:30am

Where: Sala riunioni (1fs floor)

Categories
Meetings

Semantic Coherence Markers for the Early Diagnosis of the Alzheimer Disease

Matteo Delsanto will present a work published on Artificial Intelligence in Medicine, under the title Semantic coherence markers: The contribution of perplexity metrics, together with Davide Colla Daniele Radicioni, from the Computer Science Department (University of Turin), and Marco Agosto and Benedetto Vitiello, from the Department of Sciences of Public Health and Pediatrics (University of Turin).

Abstract

Devising automatic tools to assist specialists in the early detection of mental disturbances and psychotic disorders is to date a challenging scientific problem and a practically relevant activity. In this work we explore how language models (that are probability distributions over text sequences) can be employed to analyze language and discriminate between mentally impaired and healthy subjects. We have preliminarily explored whether perplexity can be considered a reliable metrics to characterize an individual’s language. Perplexity was originally conceived as an information-theoretic measure to assess how much a given language model is suited to predict a text sequence or, equivalently, how much a word sequence fits into a specific language model. We carried out an extensive experimentation with healthy subjects, and employed language models as diverse as N-grams – from 2-grams to 5-grams – and GPT-2, a transformer-based language model. Our experiments show that irrespective of the complexity of the employed language model, perplexity scores are stable and sufficiently consistent for analyzing the language of individual subjects, and at the same time sensitive enough to capture differences due to linguistic registers adopted by the same speaker, e.g., in interviews and political rallies. A second array of experiments was designed to investigate whether perplexity scores may be used to discriminate between the transcripts of healthy subjects and subjects suffering from Alzheimer Disease (AD). Our best performing models achieved full accuracy and F-score (1.00 in both precision/specificity and recall/sensitivity) in categorizing subjects from both the AD class, and control subjects. These results suggest that perplexity can be a valuable analytical metrics with potential application to supporting early diagnosis of symptoms of mental disorders.

other links
Data in brief dataset publication
Semantic Coherence Dataset: Speech transcripts

When: 13/01/2023

Where: Sala conferenze at the 3° floor

Categories
Meetings

Voice interaction for supporting blind people to access mathematical  expressions

Pier Felice Balestrucci presenterà il suo lavoro Dialogare con la matematica: verso un’interazione dialogica vocale automatica per la navigazione di espressioni matematiche.

Le tecnologie assistive sono quelle tecnologie che permettono di rendere accessibili e usabili prodotti informatici, hardware e software, anche a persone disabili.

Lo scopo di questo lavoro è rendere più facile e comprensibile l’ascolto di formule matematiche a persone ipovedenti e cieche. Le formule matematiche sono ricche di simboli difficilmente leggibili dai lettori di schermo, ossia applicazioni software che identificano ed interpretano il testo mostrato sullo schermo del computer.

Generalmente chi ha una disabilità visiva per leggere le formule usa una rappresentazione LATEX, la quale risulta non solo molto verbosa e lenta, ma costituisce una barriera per chi non conosce questo linguaggio.

L’ obiettivo principale è la realizzazione di uno strumento che possa portare diversi vantaggi e semplificazioni a supporto di queste categorie utente. Questo strumento prevede sia la traduzione delle formule matematiche in frasi matematiche, ossia frasi in linguaggio naturale convertite con tecniche di Natural Language Generation, che l’introduzione di un sistema di dialogo per navigare ed esplorare la formula.

When: 15/12/22

Categories
Meetings

Prompt-based learning

for text classification in Italian

Valerio Basile will present some experiments carried out using the new zero-shot technique of prompt-based learning for various text classification tasks in Italian.

Abstract: Prompt-based learning is a recent paradigm in NLP that leverages large pre-trained language models to perform a variety of tasks. With this technique, it is possible to build classifiers that do not need training data (zero-shot). In this work, Valerio assessed the status of prompt-based learning applied to several text classification tasks in the Italian language. The results indicate that the performance gap towards current supervised methods is still relevant. However, the difference in performance between pre-trained models and the characteristic of the prompt-based classifier of operating in a zero-shot fashion open a discussion regarding the next generation of evaluation campaigns for NLP.

Categories
Meetings

Assessing the impact of contextual information in hate speech detection

Juan Manuel Pérez will give a talk during his visiting week to the Content-centered Computing group.

In recent years, hate speech has gained great relevance in social networks and other virtual media because of its intensity and its relationship with violent acts against members of protected groups. Due to the incommensurable amount of content generated by users, great effort has been made in the research and development of automatic tools to aid the analysis and moderation of this speech, at least in its most threatening forms.

One of the limitations of current approaches to automatic hate speech detection is the lack of context. Most studies and resources are performed on data without context; that is, isolated messages without any type of conversational context or the topic being discussed. This restricts the available information to define if a post on a social network is hateful or not.

In this talk, I will comment on some experiments we have performed to assess the impact of context in hate speech detection. With this in mind, we built a contextualized dataset for hate speech detection based on user responses to news posts from media outlets on Twitter. This corpus was collected in the Rioplatense dialectal variety of Spanish and focuses on hate speech associated with the COVID-19 pandemic.

For the two proposed tasks using this novel corpus (binary detection; and granular detection, where the system has to predict the attacked characteristics), the classification experiments using state-of-the-art techniques show evidence that adding contextual information improves hate speech detection performance.

When: September 29, 2022, at 11:00

Where: Conference room 3rd floor (Sala Seminari)