Categories
Meetings

Semantic Coherence Markers for the Early Diagnosis of the Alzheimer Disease

Matteo Delsanto will present a work published on Artificial Intelligence in Medicine, under the title Semantic coherence markers: The contribution of perplexity metrics, together with Davide Colla Daniele Radicioni, from the Computer Science Department (University of Turin), and Marco Agosto and Benedetto Vitiello, from the Department of Sciences of Public Health and Pediatrics (University of Turin).

Abstract

Devising automatic tools to assist specialists in the early detection of mental disturbances and psychotic disorders is to date a challenging scientific problem and a practically relevant activity. In this work we explore how language models (that are probability distributions over text sequences) can be employed to analyze language and discriminate between mentally impaired and healthy subjects. We have preliminarily explored whether perplexity can be considered a reliable metrics to characterize an individual’s language. Perplexity was originally conceived as an information-theoretic measure to assess how much a given language model is suited to predict a text sequence or, equivalently, how much a word sequence fits into a specific language model. We carried out an extensive experimentation with healthy subjects, and employed language models as diverse as N-grams – from 2-grams to 5-grams – and GPT-2, a transformer-based language model. Our experiments show that irrespective of the complexity of the employed language model, perplexity scores are stable and sufficiently consistent for analyzing the language of individual subjects, and at the same time sensitive enough to capture differences due to linguistic registers adopted by the same speaker, e.g., in interviews and political rallies. A second array of experiments was designed to investigate whether perplexity scores may be used to discriminate between the transcripts of healthy subjects and subjects suffering from Alzheimer Disease (AD). Our best performing models achieved full accuracy and F-score (1.00 in both precision/specificity and recall/sensitivity) in categorizing subjects from both the AD class, and control subjects. These results suggest that perplexity can be a valuable analytical metrics with potential application to supporting early diagnosis of symptoms of mental disorders.

other links
Data in brief dataset publication
Semantic Coherence Dataset: Speech transcripts

When: 13/01/2023

Where: Sala conferenze at the 3° floor

Categories
Meetings

Voice interaction for supporting blind people to access mathematical  expressions

Pier Felice Balestrucci presenterà il suo lavoro Dialogare con la matematica: verso un’interazione dialogica vocale automatica per la navigazione di espressioni matematiche.

Le tecnologie assistive sono quelle tecnologie che permettono di rendere accessibili e usabili prodotti informatici, hardware e software, anche a persone disabili.

Lo scopo di questo lavoro è rendere più facile e comprensibile l’ascolto di formule matematiche a persone ipovedenti e cieche. Le formule matematiche sono ricche di simboli difficilmente leggibili dai lettori di schermo, ossia applicazioni software che identificano ed interpretano il testo mostrato sullo schermo del computer.

Generalmente chi ha una disabilità visiva per leggere le formule usa una rappresentazione LATEX, la quale risulta non solo molto verbosa e lenta, ma costituisce una barriera per chi non conosce questo linguaggio.

L’ obiettivo principale è la realizzazione di uno strumento che possa portare diversi vantaggi e semplificazioni a supporto di queste categorie utente. Questo strumento prevede sia la traduzione delle formule matematiche in frasi matematiche, ossia frasi in linguaggio naturale convertite con tecniche di Natural Language Generation, che l’introduzione di un sistema di dialogo per navigare ed esplorare la formula.

When: 15/12/22

Categories
Meetings

Prompt-based learning

for text classification in Italian

Valerio Basile will present some experiments carried out using the new zero-shot technique of prompt-based learning for various text classification tasks in Italian.

Abstract: Prompt-based learning is a recent paradigm in NLP that leverages large pre-trained language models to perform a variety of tasks. With this technique, it is possible to build classifiers that do not need training data (zero-shot). In this work, Valerio assessed the status of prompt-based learning applied to several text classification tasks in the Italian language. The results indicate that the performance gap towards current supervised methods is still relevant. However, the difference in performance between pre-trained models and the characteristic of the prompt-based classifier of operating in a zero-shot fashion open a discussion regarding the next generation of evaluation campaigns for NLP.

Categories
Meetings

Assessing the impact of contextual information in hate speech detection

Juan Manuel Pérez will give a talk during his visiting week to the Content-centered Computing group.

In recent years, hate speech has gained great relevance in social networks and other virtual media because of its intensity and its relationship with violent acts against members of protected groups. Due to the incommensurable amount of content generated by users, great effort has been made in the research and development of automatic tools to aid the analysis and moderation of this speech, at least in its most threatening forms.

One of the limitations of current approaches to automatic hate speech detection is the lack of context. Most studies and resources are performed on data without context; that is, isolated messages without any type of conversational context or the topic being discussed. This restricts the available information to define if a post on a social network is hateful or not.

In this talk, I will comment on some experiments we have performed to assess the impact of context in hate speech detection. With this in mind, we built a contextualized dataset for hate speech detection based on user responses to news posts from media outlets on Twitter. This corpus was collected in the Rioplatense dialectal variety of Spanish and focuses on hate speech associated with the COVID-19 pandemic.

For the two proposed tasks using this novel corpus (binary detection; and granular detection, where the system has to predict the attacked characteristics), the classification experiments using state-of-the-art techniques show evidence that adding contextual information improves hate speech detection performance.

When: September 29, 2022, at 11:00

Where: Conference room 3rd floor (Sala Seminari)

Categories
Meetings

O-Dang!

The Ontology of Dangerous Speech Messages

Simona Frenda and Marco A. Stranisci will present O-Dang! (Ontology of Dangerous speech), a systematic and interoperable Knowledge Graph for the collection of linguistic annotated data.

Inside the NLP community there is a considerable amount of language resources created, annotated and released every day with the aim of studying specific linguistic phenomena. Despite a variety of attempts in order to organize such resources has been carried on, a lack of systematic methods and of possible interoperability between resources are still present. Furthermore, when storing linguistic information, still nowadays, the most common practice is the concept of “gold standard”, which is in contrast with recent trends in NLP that aim at stressing the importance of different subjectivities and points of view when training machine learning and deep learning methods. In this talk, we present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG) for the collection of linguistic annotated data. O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community. The ontology has also been designed to account a perspectivist approach, since it provides a model for encoding both gold standard and single-annotator labels in the KG.

For more info: Paper, App

When: 08/07/2022

Where: In presence at Sala Riunioni (3° floor) and online

Categories
Meetings

An evaluation and analysis of fine-tuned representations for code-switched low-resource speech recognition

Tolúlọpẹ́ Ògúnrẹ̀mí will present her work as a PhD student at Stanford University.

Recognising code-switched speech (alternating between two or more languages or varieties of language across sentences in conversation) is an important technical and social issue essential for modern society. The majority current speech recognisers are trained monolingually and therefore do not perform well on such utterances. The use of Deep Neural Network (DNN) architectures to train models allow for shared representations and provide an opportunity to level them to better handle code-switching. In the two studies contained in this work, we show multilingual fine-tuning of self-supervised speech representations can handle code-switching in a zero-resource scenario and through analysis of the latent representations, that code-switching is encoded in the model. We find that monolingual data is enough for character-level decoding in the code-switched scenario and that representations are not similar to word vectors.

When: 4/7/2022

Where: Sala conferenze on the 3° floor

Categories
Meetings

#DeactivHate

The laboratory and the experiences in Italian high schools.

The possibility of raising awareness about misbehaviour online, such as hate speech, especially in young generations, could help society to reduce their impact, and thus, their consequences.

The Commissione Orientamento e Informatica nelle Scuole of the Computer Science Department of the University of Turin has designed various technologies that support educational projects and activities in this perspective. 

In the past year and a half, Alessandra T. Cignarella, Simona Frenda, Mirko Lai and Marco A. Stranisci developed a laboratory called #DeactivHate, specifically designed for secondary school students (aged 14-19). The cycle of 5 lessons aims at countering hateful phenomena online and also at making students aware of technologies that they use on a daily basis. Furthermore, some basic methodologies and common practices of Computational Linguistics and Artificial Intelligence are introduced.

In this talk, Alessandra will describe the teaching experience in high schools and the usefulness of some of the activities tested for bringing a small taste of NLP in Italian high schools.

When: 17/06/2022

Where: Sala riunioni 3° floor

Paper: https://iris.unito.it/retrieve/handle/2318/1823881/885619/paper35.pdf

Categories
Meetings

Towards Automatic Screening for Fibromyalgia in Italian Social Media Users

Valerio Basile will present an interesting work on the detection of users suffering Fibromyalgia analysing their messages on Twitter.

Fibromyalgia (FM) is a syndrome characterized by a number of symptoms including chronic pain, tiredness, and cognitive dysfunctions. Medical studies estimate a widespread incidence of FM, severely skewed towards women. However, while the European Parliament recognizes FM as a condition negatively impacting the lives of millions, and despite estimates of about 2 million people suffering from FM in Italy, the condition is treated unevenly across this country. One of the main obstacles toward full healthcare for FM patients in Italy is the difficult and often excessively long diagnostic path.


The goal of this study is to leverage the vast amount of natural language data available in social media, in order to model the language of FM and build an automatic system that distinguishes users suffering from FM from healthy users based on their social media post history. To this aim, he collected about 250K messages from Twitter, in Italian, from 145 users who declare to suffer from FM, and an equivalent amount of messages from random users as a control group. He built supervised classifiers with traditional machine learning techniques, namely Support Vector Machine and Random Forest, obtaining a 72% accuracy in a cross-validation experiment aimed at predicting the user class as FM or not-FM. The classifiers employ explicit features such as ngrams and lemma counts from the Italian translation of Linguistic Inquiry Word Count (LIWC), which provide an interpretable insight into the language of people with FM. He further implemented a state-of-the-art classifier based on AlBERTo, the Transformer model based on BERT pre-trained on a large collection of Italian tweets, bringing the classification accuracy up to over 78%. The high precision (0.82) on the positive class (FM) represents a promising result towards automatic, non-invasive screening of Fibromyalgia on Italian social media users.

When: 20/05/2022

Where: Sala conferenze at the 3° floor

Categories
Meetings

Application and approaches of Multi Document summarization in Medical Data with state of the art

Md Murad Hossain will talk about application and approaches of Multi Document summarization.
Multi-document summarization is an automatic procedure to extract information from multiple texts written about the same topic. It focuses on generating a coherent summary from documents concerning an event or issue. Recently, multi-document summarization techniques have been used to summarize the different web pages such as sports, weather, business, etc. Even in the medical sector, it can help outline the web pages in brief sentences or paragraphs. The recent uses of multi-document summarization techniques allow physicians or doctors know about medicine or diseases within a short time. In my presentation, I want to explain some approaches of multi-document summarization that can be used in the medical data set. I also want to show state-of-the-art based on studied articles on this topic with research gap, which may help us go ahead with the application of Multi-document Summarization approaches.

When: 6th May

Where: in presence and online

Categories
Meetings

Multi-sensory museum exhibitions informed by language

Sensory vocabularies and their uses in synesthetic metaphor detection

Simona Corciulo will address the interesting challenge of using sensory vocabularies for synesthetic metaphor detection, presenting the case of the multi-sensory museum exhibitions.

Abstract

While humans can intuitively associate words and sensory domains, it is very challenging for machines to process sensory information. Furthermore, the crucial use of sensory vocabularies for synesthetic
metaphor detection involves many difficulties.
During the CCC meeting, she will focus on considerations and insights around sensory vocabularies for natural language processing tasks and specifically for synesthetic metaphor detection.
The aim is to provide new perspectives on their uses for multi-sensory museums exhibition design informed by language.

When: 22/04/2022

Where: in presence and online