Categories
Meetings

Mining Annotator Perspectives from Hate Speech Corpora

Valerio Basile will introduce a new automatic method to identify annotators’ perspectives in controversial issues such as Hate Speech.

Title: Mining Annotator Perspectives from Hate Speech Corpora

Disagreement in annotation, traditionally treated mostly as noise, is now more and more often considered as a source of valuable information instead. He investigated a particular form of disagreement, occurring when the focus of an annotated dataset is a subjective and controversial phenomenon, therefore inducing a certain degree of polarization among the annotators’ judgments. He argued that the polarization is indicative of the conflicting perspectives held by different annotator groups, and propose a quantitative method to model this phenomenon. Moreover, he introduced a method to automatically identify shared perspectives stemming from a common background.
He tested this method on several corpora in English and Italian, manually annotated according to their hate speech content, validating prior knowledge about the groups of annotators, when available, and discovering characteristic traits among annotators with unknown background.
He found numerous precisely defined perspectives, described in terms of increased sensitivity towards textual content expressing attitudes such as xenophobia, Islamophobia, and homophobia.

When: February 11 at 11.30

Where: online

Categories
Meetings

Sign Language Recognition using Machine Learning

Muhammad Saad Amin will talk about the problems related to Sign Language Recognition, from the creation of the dataset to the design of the algorithm.

Title: Sign Language Recognition using Machine Learning

Human gesture classification and recognition is always a challenging task. Capturing human gestures and transforming these gestures into labeled digital data is mainly required to train supervised Machine Learning algorithms. Hence, SLR systems with improved accuracy and increased efficiency are the need of the hour.

In this seminar, he will discuss how to capture gestures (specifically ASL) using sensor-based prototypes? How to convert these sign gestures into digital data (dataset generation)? And how a dataset can be used for SL recognition using Supervised Machine Learning algorithms?

When: November 19 at 11.30

Where: in presentia (032_A_P03_3140) or online

Categories
Meetings

Detection of Hate Speech Spreaders

With Convolutional Neural Networks

Elisa Di Nuovo and Marco Siino will present an interesting approach used to detect hate speech spreaders in the context of the shared task Profiling Hate Speech Spreaders (HSSs) proposed at PAN 2021.

Title: Detection of hate speech spreaders using convolutional neural networks

The speakers will describe a deep learning model based on a Convolutional Neural Network (CNN) to profile hate speech spreaders online. The model was developed for the Profiling Hate Speech Spreaders (HSSs) task proposed by PAN 2021 organizers and hosted at the 2021 CLEF Conference. The approach, used to classify an author as HSS or not (nHSS), takes advantage of a CNN based on a single convolutional layer. In this binary classification task, on the tests performed using a 5-fold cross validation, the proposed model reaches a maximum accuracy of 0.80 on the multilingual (i.e., English and Spanish) training set, and a minimum loss value of 0.51 on the same set. As announced by the task organizers, the model won the 2021 PAN competition on profiling HSSs, reaching an overall accuracy of 0.79 on the full test set. This overall accuracy is obtained averaging the accuracy achieved by the model on both languages. In particular, with regard to the Spanish test set the model achieves an accuracy of 0.85, while on the English test set the same model achieved an accuracy of 0.73.

When: On 22nd October at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7_20211022T093000Z

Paper: http://ceur-ws.org/Vol-2936/paper-189.pdf

Categories
Meetings

HaMor To Profile Hate Speech Spreaders

Mirko Lai and Marco A. Stranisci will present an innovative approach that takes into account the morality and communicative behaviour of the users to profile hate speech spreaders online.

Title: HaMor at the Profiling Hate Speech Spreaders on Twitter

In this talk, they will describe the Hate and Morality (HaMor) submission for the Profiling of Hate Speech Spreaders on Twitter, the shared task at PAN 2021.
HaMor ranked as the 19th position – over 66 participating teams – with an averaged accuracy value of 73% reached over the two languages.
This approach obtained the 43th higher accuracy for English (62%) and the 2nd higher accuracy for Spanish (84%).
In particular, it involves four types of features that help the system to infer users attitudes just from their messages: hate speech detection, users morality, named entities, and communicative behaviour.
The results of their experiments are promising and will lead to future investigations of these features in a finer grained perspective.

When: On 5th November at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7_20211022T093000Z

Paper: http://ceur-ws.org/Vol-2936/paper-178.pdf

Categories
Meetings

WordUp! at VaxxStance 2021

Combining Contextual Information with Textual and Dependency-Based Syntactic Features for Stance Detection

Mirko Lai and Alessandra T. Cignarella will present an innovative approach to detect stance online, proposed in the frame of the VaxxStance shared task.

Title: WordUp! at VaxxStance 2021: Combining Contextual Information with Textual and Dependency-Based Syntactic Features for Stance Detection

In this talk, they will describe the participation of the WordUp! team in the VaxxStance shared task at IberLEF 2021. The goal of the competition is to determine the author’s stance from tweets written both in Spanish and Basque on the topic of the Antivaxxers movement. Their approach, in the four different tracks proposed, combines the Logistic Regression classifier with diverse groups of features: stylistic, tweet-based, user-based, lexicon-based, dependency-based, and network-based. The outcomes of their experiments are in line with state-of-the-art results on other languages, proving the efficacy of combining methods derived from NLP and Network Science for detecting stance in Spanish and Basque.

When: On 8th October at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7_20210702T093000Z?from_login=true

Paper: http://ceur-ws.org/Vol-2943/vaxx_paper3.pdf

Categories
Meetings

Whose Opinions Matter?

Perspective-aware Models to Identify Opinions of Hate Speech Victims in Abusive Language Detection.

Sohail Akhtar will present an in-depth study of the novel approaches to detect hate speech focusing on the development of approaches to leverage fine-grained knowledge derived from the annotations of individual annotators.

Title: Whose Opinions Matter? Perspective-aware Models to Identify Opinions of Hate Speech Victims in Abusive Language Detection.

Hate Speech (HS) is a form of abusive language and its detection on social media platforms is a rather difficult but important task. The sudden rise in hate speech related incidents on social media is considered a major issue. The technologies being developed for HS detection mainly employ supervised machine learning approaches in Natural Language Processing (NLP). Training such models require manually annotated data by humans, either by crowd-sourcing paid workers or by domain experts, for training and bench-marking purposes.

Because abusive language is subjective in nature, there might be highly polarizing topics or events involved in the annotation of abusive contents such as HS. Therefore, novel approaches are required to model conflicting perspectives and opinions coming from people with different personal and demographic backgrounds which raise issues concerning the quality of the annotation itself and might also impact the gold standard data to train NLP models. The annotators might also show different sensitivity levels against particular forms of hate, which results in low inter-annotators agreements. The online platforms used for the HS annotation does not provide any background information about the annotators and the views and personal opinions of the victims of online hate are often ignored in HS detection tasks.

In this talk, he will present an in-depth study of the novel approaches to detect various forms of abusive language against minorities. The work is focused on developing approaches to leverage fine-grained knowledge derived from the annotations of individual annotators, before a gold standard is created in which the subjectivity of the annotators is averaged out.

The research work aimed at developing approaches to model the polarized opinions coming from different communities under the hypothesis that similar characteristics (ethnicity, social background, culture etc.) can influence the perspectives of the annotators on a certain phenomenon and based on such information, they can be grouped together.

The institution is that by relying on such information, it is possible to divide the annotators into separate groups. Based on this grouping, separate gold standards are crated for individual to train state-of-the-art deep learning models for abusive language detection. Additionally, an ensemble approach is implemented to combine the perspective-aware classifiers from different groups into an inclusive model.

The research proposed a novel resource, a multi-perspective English language dataset annotated according to different sub-categories relevant for characterizing online abuse: HS, aggressiveness, offensiveness and stereotype. Unlike previous work, where the annotations were based on crowd-sourcing, here, the study involved the victims of targeted communities in the annotation process, who volunteered to annotate the dataset, providing a natural selection of the annotator groups based on their personal characteristics.  These annotators are from different cultural and social background and demographics. These annotated datasets and one of the groups involve the members of targeted communities.

By training state-of-the-art deep learning models on this novel resource, the results showed that how the proposed approach improves the prediction performance of a state-of-the-art supervised classifier.

Moreover, there is an in-depth qualitative analysis of the novel dataset by analyzing the individual instances of the tweets to identify and understand the topics and events causing polarization among the annotators. The analysis proved that the keywords (unigram features) are indeed strongly linked with and influenced by the culture, religion and demographic background of annotators.

When: On 2nd July at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7_20210702T093000Z?from_login=true

Categories
Meetings

Weights & Biases

Mattia Cerrato presents a tutorial about the use of Weights & Biases platform useful to keep track of results, hyperparameters and random seeds in ML experiments.

Title: Experiment tracking with Weights & Biases

Performing experiments is perhaps the most time consuming activity in ML research, especially at the junior level. Often too little effort is spent in understanding how to optimize this process. The Weights & Biases (W&B) platform provides a simple Python interface which may be used to keep track of results, hyperparameters and random seeds. It has intuitive visualization utilities which may be used to write experimental reports starting from raw performance metric data. Furthermore, it provides an easy way to perform hyperparameter search (random, grid and even Bayesian search strategies are available) and even some light training orchestration capabilities. In this talk, we will see how to extend our experimental scripts so that W&B can help us keep our sanity during the experimental phase of a project.

When: On 4th June at 11.30

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7_20210604T093000Z

Categories
Meetings

Topic Shift in online debates on Twitter

Komal Florio presents an interesting investigation on the topic in the public discourse on Social Media.

Title: Topic shift in the public discourse on Social Media: a case study about the covid-19 induced lockdown in Italy in 2020

In this work she tried to tackle the challenge of measuring and quantifying the topic shift in the public discourse on Social Media, using as a case study the online debate on  Twitter following the covid-19 related lockdown in Italy in 2020, by means of a dedicated filtering of TWITA, a  dataset of tweets in Italian.

At first she tried to predict  which messages contained hate speech using AlBERTo, BERT fine tuned on Italian social media language, but the results were far from satisfying. She then tried a lexicon based approach and found that the dominant categories were derogatory words, insults regarding moral or behavioural defects and cognitive disabilities or diversity. Nevertheless the accuracy of this classification was not very high, and analysing the words in the lexicon that determined the classification for the top 3 categories it is possible to conclude that a manual revision of the list of words per each category could improve the outcome of this task.

She then moved to the most powerful classification tool that was used on these data: topic modeling. A first classification with a  Latent Dirichlet Allocation algorithm (LDA) proved valid in extracting the conversation around specific relevant events that happened in Italy in the time from between February 2020 and April 2020. To obtain consistent topics over time she then moved to a Dynamic Topic Modeling, which extracted  “healthcare” and “quarantine”  as consistently the predominant one in the corpus. She analyzed the peaks in documents related to this topic and to the mentioned lexicon categories and found out that they happened around  the same time slices were the topics “quarantine” and “healthcare” have spikes as well, showing that the most heated debates happened around public measures that affected directly and immediately on both the collectivity (“healthcare”)and personal life (“quarantine”).

She then tried to use all the information gained so far to enhance the hate speech prediction performed by means of AlBERTo. Unfortunately this experiment did not lead to significant results due to the very small size of the  resulting training dataset. Infusing deep learning model with information extracted from topic modeling sounds certainly a promising way to enhance the accuracy of hate speech prediction, but she feels like a further investigation on size and characteristics of datasets is absolutely essential to gain better results.

When: On 21st May at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7_20210521T093000Z

Categories
Meetings

The Octopus Paper

Valerio Basile presents an interesting consideration about the difference between form and meaning of language in neural language models.

Title: Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data

The success of the large neural language models on many NLP tasks is exciting. However, these successes sometimes lead to hype in which these models are being described as “understanding” language or capturing “meaning”. In this position paper it is argued that a system trained only on form has a priori no way to learn meaning, and that a clear understanding of the distinction between form and meaning will help guide the field towards better science around natural language understanding.

Emily M. Bender and Alexander Koller make their point through an incredibly witty story involving a very curious sea creature and a couple of castaways on bear-ridden tropical islands.

Related Paper: Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data

When: On 7th May at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7

Categories
Meetings

How to avoid “Sorry, I don’t understand. Can you repeat please?” in a dialogue system

In this talk, Alessandro Mazzei will present the results of a project developed with TIM about the improvement of a dialogue system in the domain of customer service for TELCO. The idea is to compensate for the lack of linguistic information by predicting the intentions of the humans on the basis of domain knowledge.

When: On 23th April at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7_20210423T093000Z?from_login=true