Categories
Meetings

Detection of Hate Speech Spreaders

With Convolutional Neural Networks

Elisa Di Nuovo and Marco Siino will present an interesting approach used to detect hate speech spreaders in the context of the shared task Profiling Hate Speech Spreaders (HSSs) proposed at PAN 2021.

Title: Detection of hate speech spreaders using convolutional neural networks

The speakers will describe a deep learning model based on a Convolutional Neural Network (CNN) to profile hate speech spreaders online. The model was developed for the Profiling Hate Speech Spreaders (HSSs) task proposed by PAN 2021 organizers and hosted at the 2021 CLEF Conference. The approach, used to classify an author as HSS or not (nHSS), takes advantage of a CNN based on a single convolutional layer. In this binary classification task, on the tests performed using a 5-fold cross validation, the proposed model reaches a maximum accuracy of 0.80 on the multilingual (i.e., English and Spanish) training set, and a minimum loss value of 0.51 on the same set. As announced by the task organizers, the model won the 2021 PAN competition on profiling HSSs, reaching an overall accuracy of 0.79 on the full test set. This overall accuracy is obtained averaging the accuracy achieved by the model on both languages. In particular, with regard to the Spanish test set the model achieves an accuracy of 0.85, while on the English test set the same model achieved an accuracy of 0.73.

When: On 22nd October at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7_20211022T093000Z

Paper: http://ceur-ws.org/Vol-2936/paper-189.pdf

Categories
Meetings

HaMor To Profile Hate Speech Spreaders

Mirko Lai and Marco A. Stranisci will present an innovative approach that takes into account the morality and communicative behaviour of the users to profile hate speech spreaders online.

Title: HaMor at the Profiling Hate Speech Spreaders on Twitter

In this talk, they will describe the Hate and Morality (HaMor) submission for the Profiling of Hate Speech Spreaders on Twitter, the shared task at PAN 2021.
HaMor ranked as the 19th position – over 66 participating teams – with an averaged accuracy value of 73% reached over the two languages.
This approach obtained the 43th higher accuracy for English (62%) and the 2nd higher accuracy for Spanish (84%).
In particular, it involves four types of features that help the system to infer users attitudes just from their messages: hate speech detection, users morality, named entities, and communicative behaviour.
The results of their experiments are promising and will lead to future investigations of these features in a finer grained perspective.

When: On 5th November at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7_20211022T093000Z

Paper: http://ceur-ws.org/Vol-2936/paper-178.pdf

Categories
Meetings

WordUp! at VaxxStance 2021

Combining Contextual Information with Textual and Dependency-Based Syntactic Features for Stance Detection

Mirko Lai and Alessandra T. Cignarella will present an innovative approach to detect stance online, proposed in the frame of the VaxxStance shared task.

Title: WordUp! at VaxxStance 2021: Combining Contextual Information with Textual and Dependency-Based Syntactic Features for Stance Detection

In this talk, they will describe the participation of the WordUp! team in the VaxxStance shared task at IberLEF 2021. The goal of the competition is to determine the author’s stance from tweets written both in Spanish and Basque on the topic of the Antivaxxers movement. Their approach, in the four different tracks proposed, combines the Logistic Regression classifier with diverse groups of features: stylistic, tweet-based, user-based, lexicon-based, dependency-based, and network-based. The outcomes of their experiments are in line with state-of-the-art results on other languages, proving the efficacy of combining methods derived from NLP and Network Science for detecting stance in Spanish and Basque.

When: On 8th October at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7_20210702T093000Z?from_login=true

Paper: http://ceur-ws.org/Vol-2943/vaxx_paper3.pdf

Categories
Meetings

Whose Opinions Matter?

Perspective-aware Models to Identify Opinions of Hate Speech Victims in Abusive Language Detection.

Sohail Akhtar will present an in-depth study of the novel approaches to detect hate speech focusing on the development of approaches to leverage fine-grained knowledge derived from the annotations of individual annotators.

Title: Whose Opinions Matter? Perspective-aware Models to Identify Opinions of Hate Speech Victims in Abusive Language Detection.

Hate Speech (HS) is a form of abusive language and its detection on social media platforms is a rather difficult but important task. The sudden rise in hate speech related incidents on social media is considered a major issue. The technologies being developed for HS detection mainly employ supervised machine learning approaches in Natural Language Processing (NLP). Training such models require manually annotated data by humans, either by crowd-sourcing paid workers or by domain experts, for training and bench-marking purposes.

Because abusive language is subjective in nature, there might be highly polarizing topics or events involved in the annotation of abusive contents such as HS. Therefore, novel approaches are required to model conflicting perspectives and opinions coming from people with different personal and demographic backgrounds which raise issues concerning the quality of the annotation itself and might also impact the gold standard data to train NLP models. The annotators might also show different sensitivity levels against particular forms of hate, which results in low inter-annotators agreements. The online platforms used for the HS annotation does not provide any background information about the annotators and the views and personal opinions of the victims of online hate are often ignored in HS detection tasks.

In this talk, he will present an in-depth study of the novel approaches to detect various forms of abusive language against minorities. The work is focused on developing approaches to leverage fine-grained knowledge derived from the annotations of individual annotators, before a gold standard is created in which the subjectivity of the annotators is averaged out.

The research work aimed at developing approaches to model the polarized opinions coming from different communities under the hypothesis that similar characteristics (ethnicity, social background, culture etc.) can influence the perspectives of the annotators on a certain phenomenon and based on such information, they can be grouped together.

The institution is that by relying on such information, it is possible to divide the annotators into separate groups. Based on this grouping, separate gold standards are crated for individual to train state-of-the-art deep learning models for abusive language detection. Additionally, an ensemble approach is implemented to combine the perspective-aware classifiers from different groups into an inclusive model.

The research proposed a novel resource, a multi-perspective English language dataset annotated according to different sub-categories relevant for characterizing online abuse: HS, aggressiveness, offensiveness and stereotype. Unlike previous work, where the annotations were based on crowd-sourcing, here, the study involved the victims of targeted communities in the annotation process, who volunteered to annotate the dataset, providing a natural selection of the annotator groups based on their personal characteristics.  These annotators are from different cultural and social background and demographics. These annotated datasets and one of the groups involve the members of targeted communities.

By training state-of-the-art deep learning models on this novel resource, the results showed that how the proposed approach improves the prediction performance of a state-of-the-art supervised classifier.

Moreover, there is an in-depth qualitative analysis of the novel dataset by analyzing the individual instances of the tweets to identify and understand the topics and events causing polarization among the annotators. The analysis proved that the keywords (unigram features) are indeed strongly linked with and influenced by the culture, religion and demographic background of annotators.

When: On 2nd July at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7_20210702T093000Z?from_login=true

Categories
Meetings

Weights & Biases

Mattia Cerrato presents a tutorial about the use of Weights & Biases platform useful to keep track of results, hyperparameters and random seeds in ML experiments.

Title: Experiment tracking with Weights & Biases

Performing experiments is perhaps the most time consuming activity in ML research, especially at the junior level. Often too little effort is spent in understanding how to optimize this process. The Weights & Biases (W&B) platform provides a simple Python interface which may be used to keep track of results, hyperparameters and random seeds. It has intuitive visualization utilities which may be used to write experimental reports starting from raw performance metric data. Furthermore, it provides an easy way to perform hyperparameter search (random, grid and even Bayesian search strategies are available) and even some light training orchestration capabilities. In this talk, we will see how to extend our experimental scripts so that W&B can help us keep our sanity during the experimental phase of a project.

When: On 4th June at 11.30

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7_20210604T093000Z

Categories
Meetings

Topic Shift in online debates on Twitter

Komal Florio presents an interesting investigation on the topic in the public discourse on Social Media.

Title: Topic shift in the public discourse on Social Media: a case study about the covid-19 induced lockdown in Italy in 2020

In this work she tried to tackle the challenge of measuring and quantifying the topic shift in the public discourse on Social Media, using as a case study the online debate on  Twitter following the covid-19 related lockdown in Italy in 2020, by means of a dedicated filtering of TWITA, a  dataset of tweets in Italian.

At first she tried to predict  which messages contained hate speech using AlBERTo, BERT fine tuned on Italian social media language, but the results were far from satisfying. She then tried a lexicon based approach and found that the dominant categories were derogatory words, insults regarding moral or behavioural defects and cognitive disabilities or diversity. Nevertheless the accuracy of this classification was not very high, and analysing the words in the lexicon that determined the classification for the top 3 categories it is possible to conclude that a manual revision of the list of words per each category could improve the outcome of this task.

She then moved to the most powerful classification tool that was used on these data: topic modeling. A first classification with a  Latent Dirichlet Allocation algorithm (LDA) proved valid in extracting the conversation around specific relevant events that happened in Italy in the time from between February 2020 and April 2020. To obtain consistent topics over time she then moved to a Dynamic Topic Modeling, which extracted  “healthcare” and “quarantine”  as consistently the predominant one in the corpus. She analyzed the peaks in documents related to this topic and to the mentioned lexicon categories and found out that they happened around  the same time slices were the topics “quarantine” and “healthcare” have spikes as well, showing that the most heated debates happened around public measures that affected directly and immediately on both the collectivity (“healthcare”)and personal life (“quarantine”).

She then tried to use all the information gained so far to enhance the hate speech prediction performed by means of AlBERTo. Unfortunately this experiment did not lead to significant results due to the very small size of the  resulting training dataset. Infusing deep learning model with information extracted from topic modeling sounds certainly a promising way to enhance the accuracy of hate speech prediction, but she feels like a further investigation on size and characteristics of datasets is absolutely essential to gain better results.

When: On 21st May at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7_20210521T093000Z

Categories
Meetings

The Octopus Paper

Valerio Basile presents an interesting consideration about the difference between form and meaning of language in neural language models.

Title: Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data

The success of the large neural language models on many NLP tasks is exciting. However, these successes sometimes lead to hype in which these models are being described as “understanding” language or capturing “meaning”. In this position paper it is argued that a system trained only on form has a priori no way to learn meaning, and that a clear understanding of the distinction between form and meaning will help guide the field towards better science around natural language understanding.

Emily M. Bender and Alexander Koller make their point through an incredibly witty story involving a very curious sea creature and a couple of castaways on bear-ridden tropical islands.

Related Paper: Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data

When: On 7th May at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7

Categories
Meetings

How to avoid “Sorry, I don’t understand. Can you repeat please?” in a dialogue system

In this talk, Alessandro Mazzei will present the results of a project developed with TIM about the improvement of a dialogue system in the domain of customer service for TELCO. The idea is to compensate for the lack of linguistic information by predicting the intentions of the humans on the basis of domain knowledge.

When: On 23th April at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7_20210423T093000Z?from_login=true

Categories
Meetings

LessLex

Davide Colla presents a novel multilingual lexical resource called LessLex.

Title: LessLex: Linking Multilingual Embeddings to SenSe Representations of LEXical Items

LessLex is a novel multilingual lexical resource. Different from the vast majority of existing approaches, he grounds the embeddings on a sense inventory made available from the BabelNet semantic network. In this setting, multilingual access is governed by the mapping of terms onto their underlying sense descriptions, such that all vectors co-exist in the same semantic space. As a result, for each term there are thus the “blended” terminological vector along with those describing all senses associated to that term. LessLex has been tested on three tasks relevant to lexical semantics: conceptual similarity, contextual similarity, and semantic text similarity. He experimented over the principal data sets for such tasks in their multilingual and crosslingual variants, improving on or closely approaching state-of-the-art results. He concludes by arguing that LessLex vectors may be relevant for practical applications and for research on conceptual and lexical access and competence.

Related Paper: LessLex: Linking Multilingual Embeddings to SenSe Representations of LEXical Items 

When: On 26th March at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7

Categories
Meetings

NLP for Music Information Retrieval

Michael Kurt Fell presents an interesting analysis of Lyrics Structure and Content.

Title: Natural Language Processing for Music Information Retrieval: Deep Analysis of Lyrics Structure and Content

Applications in Music Information Retrieval and Computational Musicology have traditionally relied on features extracted from the music content in the form of audio, but mostly ignored the song lyrics. More recently, improvements in fields such as music recommendation have been made by taking into account external metadata related to the song. In this talk, he will demonstrate that extracting knowledge from the song lyrics is the next step to improve the user’s experience when interacting with music. To extract knowledge from vast amounts of song lyrics, he will show for different textual aspects (their structure, content, and perception) how Natural Language Processing (NLP) methods can be adapted and successfully applied to lyrics. For the structural aspect of lyrics, a structural description of it is obtained by introducing a model that efficiently segments the lyrics into its characteristic parts (e.g. intro, verse, chorus). In a second stage, the content of lyrics is represented by means of summarizing the lyrics in a way that respects the characteristic lyrics structure. Finally, on the perception of lyrics he faced the problem of detecting explicit content in a song text. This task proves to be very hard and he will show that the difficulty partially arises from the subjective nature of perceiving lyrics in one way or another depending on the context. As a consequence of this work, he has also created the annotated WASABI Song Corpus, a dataset of two million songs with NLP lyrics annotations on various levels.

Related Work: Michael Fell. Natural Language Processing for Music Information Retrieval: Deep Analysis of Lyrics Structure and Content. Computation and Language [cs.CL]. Université Côte D’Azur, 2020.

When: On 26th February at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7