Categories
Meetings

Assessing the impact of contextual information in hate speech detection

Juan Manuel Pérez will give a talk during his visiting week to the Content-centered Computing group.

In recent years, hate speech has gained great relevance in social networks and other virtual media because of its intensity and its relationship with violent acts against members of protected groups. Due to the incommensurable amount of content generated by users, great effort has been made in the research and development of automatic tools to aid the analysis and moderation of this speech, at least in its most threatening forms.

One of the limitations of current approaches to automatic hate speech detection is the lack of context. Most studies and resources are performed on data without context; that is, isolated messages without any type of conversational context or the topic being discussed. This restricts the available information to define if a post on a social network is hateful or not.

In this talk, I will comment on some experiments we have performed to assess the impact of context in hate speech detection. With this in mind, we built a contextualized dataset for hate speech detection based on user responses to news posts from media outlets on Twitter. This corpus was collected in the Rioplatense dialectal variety of Spanish and focuses on hate speech associated with the COVID-19 pandemic.

For the two proposed tasks using this novel corpus (binary detection; and granular detection, where the system has to predict the attacked characteristics), the classification experiments using state-of-the-art techniques show evidence that adding contextual information improves hate speech detection performance.

When: September 29, 2022, at 11:00

Where: Conference room 3rd floor (Sala Seminari)

Categories
Meetings

O-Dang!

The Ontology of Dangerous Speech Messages

Simona Frenda and Marco A. Stranisci will present O-Dang! (Ontology of Dangerous speech), a systematic and interoperable Knowledge Graph for the collection of linguistic annotated data.

Inside the NLP community there is a considerable amount of language resources created, annotated and released every day with the aim of studying specific linguistic phenomena. Despite a variety of attempts in order to organize such resources has been carried on, a lack of systematic methods and of possible interoperability between resources are still present. Furthermore, when storing linguistic information, still nowadays, the most common practice is the concept of “gold standard”, which is in contrast with recent trends in NLP that aim at stressing the importance of different subjectivities and points of view when training machine learning and deep learning methods. In this talk, we present O-Dang!: The Ontology of Dangerous Speech Messages, a systematic and interoperable Knowledge Graph (KG) for the collection of linguistic annotated data. O-Dang! is designed to gather and organize Italian datasets into a structured KG, according to the principles shared within the Linguistic Linked Open Data community. The ontology has also been designed to account a perspectivist approach, since it provides a model for encoding both gold standard and single-annotator labels in the KG.

For more info: Paper, App

When: 08/07/2022

Where: In presence at Sala Riunioni (3° floor) and online

Categories
Meetings

An evaluation and analysis of fine-tuned representations for code-switched low-resource speech recognition

Tolúlọpẹ́ Ògúnrẹ̀mí will present her work as a PhD student at Stanford University.

Recognising code-switched speech (alternating between two or more languages or varieties of language across sentences in conversation) is an important technical and social issue essential for modern society. The majority current speech recognisers are trained monolingually and therefore do not perform well on such utterances. The use of Deep Neural Network (DNN) architectures to train models allow for shared representations and provide an opportunity to level them to better handle code-switching. In the two studies contained in this work, we show multilingual fine-tuning of self-supervised speech representations can handle code-switching in a zero-resource scenario and through analysis of the latent representations, that code-switching is encoded in the model. We find that monolingual data is enough for character-level decoding in the code-switched scenario and that representations are not similar to word vectors.

When: 4/7/2022

Where: Sala conferenze on the 3° floor

Categories
Meetings

#DeactivHate

The laboratory and the experiences in Italian high schools.

The possibility of raising awareness about misbehaviour online, such as hate speech, especially in young generations, could help society to reduce their impact, and thus, their consequences.

The Commissione Orientamento e Informatica nelle Scuole of the Computer Science Department of the University of Turin has designed various technologies that support educational projects and activities in this perspective. 

In the past year and a half, Alessandra T. Cignarella, Simona Frenda, Mirko Lai and Marco A. Stranisci developed a laboratory called #DeactivHate, specifically designed for secondary school students (aged 14-19). The cycle of 5 lessons aims at countering hateful phenomena online and also at making students aware of technologies that they use on a daily basis. Furthermore, some basic methodologies and common practices of Computational Linguistics and Artificial Intelligence are introduced.

In this talk, Alessandra will describe the teaching experience in high schools and the usefulness of some of the activities tested for bringing a small taste of NLP in Italian high schools.

When: 17/06/2022

Where: Sala riunioni 3° floor

Paper: https://iris.unito.it/retrieve/handle/2318/1823881/885619/paper35.pdf

Categories
Meetings

Towards Automatic Screening for Fibromyalgia in Italian Social Media Users

Valerio Basile will present an interesting work on the detection of users suffering Fibromyalgia analysing their messages on Twitter.

Fibromyalgia (FM) is a syndrome characterized by a number of symptoms including chronic pain, tiredness, and cognitive dysfunctions. Medical studies estimate a widespread incidence of FM, severely skewed towards women. However, while the European Parliament recognizes FM as a condition negatively impacting the lives of millions, and despite estimates of about 2 million people suffering from FM in Italy, the condition is treated unevenly across this country. One of the main obstacles toward full healthcare for FM patients in Italy is the difficult and often excessively long diagnostic path.


The goal of this study is to leverage the vast amount of natural language data available in social media, in order to model the language of FM and build an automatic system that distinguishes users suffering from FM from healthy users based on their social media post history. To this aim, he collected about 250K messages from Twitter, in Italian, from 145 users who declare to suffer from FM, and an equivalent amount of messages from random users as a control group. He built supervised classifiers with traditional machine learning techniques, namely Support Vector Machine and Random Forest, obtaining a 72% accuracy in a cross-validation experiment aimed at predicting the user class as FM or not-FM. The classifiers employ explicit features such as ngrams and lemma counts from the Italian translation of Linguistic Inquiry Word Count (LIWC), which provide an interpretable insight into the language of people with FM. He further implemented a state-of-the-art classifier based on AlBERTo, the Transformer model based on BERT pre-trained on a large collection of Italian tweets, bringing the classification accuracy up to over 78%. The high precision (0.82) on the positive class (FM) represents a promising result towards automatic, non-invasive screening of Fibromyalgia on Italian social media users.

When: 20/05/2022

Where: Sala conferenze at the 3° floor

Categories
Meetings

Application and approaches of Multi Document summarization in Medical Data with state of the art

Md Murad Hossain will talk about application and approaches of Multi Document summarization.
Multi-document summarization is an automatic procedure to extract information from multiple texts written about the same topic. It focuses on generating a coherent summary from documents concerning an event or issue. Recently, multi-document summarization techniques have been used to summarize the different web pages such as sports, weather, business, etc. Even in the medical sector, it can help outline the web pages in brief sentences or paragraphs. The recent uses of multi-document summarization techniques allow physicians or doctors know about medicine or diseases within a short time. In my presentation, I want to explain some approaches of multi-document summarization that can be used in the medical data set. I also want to show state-of-the-art based on studied articles on this topic with research gap, which may help us go ahead with the application of Multi-document Summarization approaches.

When: 6th May

Where: in presence and online

Categories
Meetings

Multi-sensory museum exhibitions informed by language

Sensory vocabularies and their uses in synesthetic metaphor detection

Simona Corciulo will address the interesting challenge of using sensory vocabularies for synesthetic metaphor detection, presenting the case of the multi-sensory museum exhibitions.

Abstract

While humans can intuitively associate words and sensory domains, it is very challenging for machines to process sensory information. Furthermore, the crucial use of sensory vocabularies for synesthetic
metaphor detection involves many difficulties.
During the CCC meeting, she will focus on considerations and insights around sensory vocabularies for natural language processing tasks and specifically for synesthetic metaphor detection.
The aim is to provide new perspectives on their uses for multi-sensory museums exhibition design informed by language.

When: 22/04/2022

Where: in presence and online

Categories
Meetings

Geodiversity and Geoheritage

Comparison of definitions and values

Alizia Mantovani will present a survey about analogies and differences between Geoheritage and of Geodiversity.

In literature, the relation between geoheritage and geodiversity is well rooted. In fact, the term geodiversity is always present in papers that concern geoheritage. However, the methods of assessment and the characteristics that describe them have different approaches and use terms that are sometimes similar and sometimes different. In literature, it is possible to find different lists of values that characterize the elements of geoheritage, as well as there is a system of characterization of the services. Those lists, even if applied to concepts that are often associated, are rarely associated and cross-used. In this talk, she will explore different points of view on the characterization of geoheritage and of geodiversity through the comparison of the assessment approaches, and how they are similar and different. 

When: 08/04/2022

Where: in presence and online

Categories
Meetings

iTelos – A methodology for building reusable purpose-specific Knowledge Graphs

Simone Bocca (University of Trento), will present iTelos – A methodology for building reusable purpose-specific Knowledge Graphs.

Knowledge Graphs (KGs) have become more and more popular in recent years, due to their efficiency in handling, representing and integrating information. Within different areas of interest KGs are exploited, for several objectives, by applications, services, as well as data analysis and visualization. Such popularity increased the need of building KGs for many different purposes stated by users, sometimes, without a clear understanding about the several issues to be addressed while building a KG. We propose iTelos, a KG building methodology designed to support the user in resolving those issues. In other words, iTelos aims to reduce the effort in building KGs as suitable as possible for the purpose expressed by the final users. To this end, the methodology is based on two key ideas; (i) to stratify the resources involved into different semantic interoperability levels, in order to deal with multiple types of data heterogeneity; (ii) to enhance as much as possible the reuse of already existing data and knowledge resources during the KG building process, thus reducing the effort required for the construction, and producing in turn highly reusable resources. iTelos is currently taught in the Knowledge and Data Integration (KDI) master course in University of Trento (Italy) and Jilin University (China), as well as adopted in EU projects by KnowDive group (University of Trento, Department of Information engineering and Computer Science).

Related links:

Giunchiglia, F., Bocca, S., Fumagalli, M., Bagchi, M., & Zamboni, A. (2021). iTelos–Purpose Driven Knowledge Graph Generation. arXiv preprint arXiv:2105.09418.
https://arxiv.org/abs/2105.09418

Giunchiglia, F., Zamboni, A., Bagchi, M., & Bocca, S. (2021). Stratified data integration. arXiv preprint arXiv:2105.09432. https://arxiv.org/abs/2105.09432

Giunchiglia, F., Khuyagbaatar B., Gabor, B.: Understanding and exploiting language diversity. In: IJCAI (2017) https://www.ijcai.org/proceedings/2017/0560.pdf

When: 25/03/22

Where: online and in presence

Categories
Meetings

Models and vocabularies for ancient Near Eastern prosopographies

Rossana Damiano, Stefano De Martino (Dipartimento di Studi Storici) and Elena Devecchi (Dipartimento di Studi Storici) will present the findings of an interesting investigation performed in the PRIN project  “Writing uses: Transmission of Knowledge, Administrative Practices and Political Control in Anatolian and Syro-Anatolian Polities in the II and I millennium BC”.

Title: Models and vocabularies for ancient Near Eastern prosopographies 

Prosopographies, intended as the large scale study of the people’s life events as they emerge from written sources, have been largely used in the last decade to study the social structure of ancient societies. The analysis of professional, kinship, administrative and political relations of the past, informed on real data, can confirm the models put forth by historians and archaeologists through traditional research paradigms, and in some cases suggest new ones. In this sense, the availability of agreed-upon, formally expressed vocabularies for describing these data and the relations to sources is a key factor to the development of methods for the analysis of social networks from the past in support of the work of historians. 

The PRIN project  “Writing uses: Transmission of Knowledge, Administrative Practices and Political Control in Anatolian and Syro-Anatolian Polities in the II and I millennium BC.” (2020-2022) has investigated the adaptation of a factoid-based model of prosopographic data to the case study of Hittite and Kassite civilizations. Aimed at the collaborative creation of prosopographic datasets for large-scale study of Hittite and Kassite social networks, the project has collected a corpus of person records and relations in a Linked Data format. In this seminar, she will describe the design of the vocabularies for the construction of the datasets and the research methods being developed from these data.  

Related links:

https://hfpo.di.unito.it/

https://kfpo.di.unito.it/

When: 11/03/2022

Where: online