Categories
Meetings

Topic Shift in online debates on Twitter

Komal Florio presents an interesting investigation on the topic in the public discourse on Social Media.

Title: Topic shift in the public discourse on Social Media: a case study about the covid-19 induced lockdown in Italy in 2020

In this work she tried to tackle the challenge of measuring and quantifying the topic shift in the public discourse on Social Media, using as a case study the online debate on  Twitter following the covid-19 related lockdown in Italy in 2020, by means of a dedicated filtering of TWITA, a  dataset of tweets in Italian.

At first she tried to predict  which messages contained hate speech using AlBERTo, BERT fine tuned on Italian social media language, but the results were far from satisfying. She then tried a lexicon based approach and found that the dominant categories were derogatory words, insults regarding moral or behavioural defects and cognitive disabilities or diversity. Nevertheless the accuracy of this classification was not very high, and analysing the words in the lexicon that determined the classification for the top 3 categories it is possible to conclude that a manual revision of the list of words per each category could improve the outcome of this task.

She then moved to the most powerful classification tool that was used on these data: topic modeling. A first classification with a  Latent Dirichlet Allocation algorithm (LDA) proved valid in extracting the conversation around specific relevant events that happened in Italy in the time from between February 2020 and April 2020. To obtain consistent topics over time she then moved to a Dynamic Topic Modeling, which extracted  “healthcare” and “quarantine”  as consistently the predominant one in the corpus. She analyzed the peaks in documents related to this topic and to the mentioned lexicon categories and found out that they happened around  the same time slices were the topics “quarantine” and “healthcare” have spikes as well, showing that the most heated debates happened around public measures that affected directly and immediately on both the collectivity (“healthcare”)and personal life (“quarantine”).

She then tried to use all the information gained so far to enhance the hate speech prediction performed by means of AlBERTo. Unfortunately this experiment did not lead to significant results due to the very small size of the  resulting training dataset. Infusing deep learning model with information extracted from topic modeling sounds certainly a promising way to enhance the accuracy of hate speech prediction, but she feels like a further investigation on size and characteristics of datasets is absolutely essential to gain better results.

When: On 21st May at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7_20210521T093000Z

Categories
Meetings

The Octopus Paper

Valerio Basile presents an interesting consideration about the difference between form and meaning of language in neural language models.

Title: Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data

The success of the large neural language models on many NLP tasks is exciting. However, these successes sometimes lead to hype in which these models are being described as “understanding” language or capturing “meaning”. In this position paper it is argued that a system trained only on form has a priori no way to learn meaning, and that a clear understanding of the distinction between form and meaning will help guide the field towards better science around natural language understanding.

Emily M. Bender and Alexander Koller make their point through an incredibly witty story involving a very curious sea creature and a couple of castaways on bear-ridden tropical islands.

Related Paper: Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data

When: On 7th May at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7

Categories
Meetings

How to avoid “Sorry, I don’t understand. Can you repeat please?” in a dialogue system

In this talk, Alessandro Mazzei will present the results of a project developed with TIM about the improvement of a dialogue system in the domain of customer service for TELCO. The idea is to compensate for the lack of linguistic information by predicting the intentions of the humans on the basis of domain knowledge.

When: On 23th April at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7_20210423T093000Z?from_login=true

Categories
Talks

Seminar: Simone Balloccu

Unaddressed challenges in persuasive dieting chatbots

Diet coaching gathered lots of interest in research. Recently, chatbots have been leveraged to address this task, with a focus on persuasion to motivate people towards behaviour change. In this talk we will take a look at current approaches in building persuasive dieting chatbots and expose a number of major unsolved challenges. We will motivate them with evidences from previous works and show that current chatbots don’t approach certain scenarios properly, hence limiting their communication and persuasion capabilities.

Simone Balloccu is a PhD student in Natural Language Generation (NLG) at the University of Aberdeen, UK. He received his BSc and MSc in computer science from the University of Cagliari. His research initially focused on unsupervised Natural Language Understanding (NLU) for the Singleton Expansion and Hypernym Discovery tasks. He is now working in the PhilHumans project (H2020) on the development of innovative healthcare AI technology. His current research focus is on user profiling, text comprehension enhancement and stress-based tailoring in the context of diet coaching.

When:  15/04/2021 at 10.00

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7

Categories
Talks

Seminar: Alan Ramponi

Language variation: challenges and future avenues

Recent developments in deep learning have shown striking performance improvements on a wide array of natural language processing (NLP) tasks. Despite the great progress, current approaches to NLP typically assume language is homogeneous, and thus fail to account for the intra- and extra-linguistic variations encoded in language, such as domains, genres, languages, and social factors. The main consequence is a dramatic drop in performance on out-of-distribution data, and unpredictable behaviors on new distributions, including the magnification of harmful stereotypes, and unfair and discriminatory decision making. In this talk, I will dig into the topic, introducing the theoretical notion of the variety space and the challenges which language variation entails. Current efforts and future avenues for research will be discussed, including transfer learning approaches, and the need for living benchmarks and transparent model and data statements.

Alan Ramponi is a PhD candidate in natural language processing at the University of Trento, Italy. He was a fellow at the Microsoft Research COSBI centre, Italy (2017-20), and a visiting PhD fellow in the NLPnorth group at the IT University of Copenhagen, Denmark (2019-20). He received the MSc in computer science cum laude from the University of Trento in 2017. His research focuses on language variation, and specifically in making natural language processing (NLP) robust and ultimately aware of variation across domains, social factors, and languages.

When:  09/04/2021 at 11.00

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7

Categories
Meetings

LessLex

Davide Colla presents a novel multilingual lexical resource called LessLex.

Title: LessLex: Linking Multilingual Embeddings to SenSe Representations of LEXical Items

LessLex is a novel multilingual lexical resource. Different from the vast majority of existing approaches, he grounds the embeddings on a sense inventory made available from the BabelNet semantic network. In this setting, multilingual access is governed by the mapping of terms onto their underlying sense descriptions, such that all vectors co-exist in the same semantic space. As a result, for each term there are thus the “blended” terminological vector along with those describing all senses associated to that term. LessLex has been tested on three tasks relevant to lexical semantics: conceptual similarity, contextual similarity, and semantic text similarity. He experimented over the principal data sets for such tasks in their multilingual and crosslingual variants, improving on or closely approaching state-of-the-art results. He concludes by arguing that LessLex vectors may be relevant for practical applications and for research on conceptual and lexical access and competence.

Related Paper: LessLex: Linking Multilingual Embeddings to SenSe Representations of LEXical Items 

When: On 26th March at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7

Categories
Meetings

NLP for Music Information Retrieval

Michael Kurt Fell presents an interesting analysis of Lyrics Structure and Content.

Title: Natural Language Processing for Music Information Retrieval: Deep Analysis of Lyrics Structure and Content

Applications in Music Information Retrieval and Computational Musicology have traditionally relied on features extracted from the music content in the form of audio, but mostly ignored the song lyrics. More recently, improvements in fields such as music recommendation have been made by taking into account external metadata related to the song. In this talk, he will demonstrate that extracting knowledge from the song lyrics is the next step to improve the user’s experience when interacting with music. To extract knowledge from vast amounts of song lyrics, he will show for different textual aspects (their structure, content, and perception) how Natural Language Processing (NLP) methods can be adapted and successfully applied to lyrics. For the structural aspect of lyrics, a structural description of it is obtained by introducing a model that efficiently segments the lyrics into its characteristic parts (e.g. intro, verse, chorus). In a second stage, the content of lyrics is represented by means of summarizing the lyrics in a way that respects the characteristic lyrics structure. Finally, on the perception of lyrics he faced the problem of detecting explicit content in a song text. This task proves to be very hard and he will show that the difficulty partially arises from the subjective nature of perceiving lyrics in one way or another depending on the context. As a consequence of this work, he has also created the annotated WASABI Song Corpus, a dataset of two million songs with NLP lyrics annotations on various levels.

Related Work: Michael Fell. Natural Language Processing for Music Information Retrieval: Deep Analysis of Lyrics Structure and Content. Computation and Language [cs.CL]. Université Côte D’Azur, 2020.

When: On 26th February at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7

Categories
Meetings

The Ontology of Migrant Writers

Marco Antonio Stranisci presents a new Computational Ontology of Migrant Writers.

Title: The Ontology of Migrant Writers

Narratives have become a pervasive, and multifaceted presence in social media. Within these communicative contexts, journalists and other influential people use them to frame specific and often conflicting points of view on the world. Correspondingly, users are an active part of this creative process because they interact and redefine narratives through their sentiment on specific topics.

However, social media are often affected by stereotypical narratives that increase the level of aggressiveness and verbal violence online, often at the expense of people vulnerable to discrimination. Many of these narratives are mainstream and strongly related to the spreading of Hate Speech (HS). Unfortunately, similar stereotypes are also present in positive narratives, which in several cases depict people vulnerable to HS exclusively as victims. Instead, stories directly created by minorities have poor visibility in the public debate even if the social web hosts a lot of them.

In order to reduce this underrepresentation, a computational ontology of migrant writers has been developed. This resource is aimed at representing people who created literary works and are or have been migrant during their life. It will be used to collect, organize, and make publicly available knowledge about migrant writers, and their narratives. The ontology design focused on two research questions:

  • how to model  the concept of migrant;
  • how to represent biographical events in their temporal succession.

In the presentation, he will first introduce the backbone ontology of migrant writers, highlighting the most challenging aspects he faced during its development. Then, he will show a series of data collection strategies he implemented to gather contents from Wikidata, DBpedia, and Wikipedia.

When: On 12th February, 2021 at 11.30 am

Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7_20210212T103000Z

Categories
Meetings

Zero-Shot Cross-Lingual Hate Speech Detection

Endang Wahyu Pamungkas presents new experiments and challenges in Hate Speech Detection in a multi-lingual context.

Title: Zero-Shot Cross-Lingual Hate Speech Detection

Hate speech is an increasingly important societal issue in the era of digital communication. Hateful expressions often make use of figurative language and, although they represent, in some sense, the dark side of language, they are also often prime examples of creative use of language. While hate speech is a global phenomenon, current studies on automatic hate speech detection are typically framed in a monolingual setting.

In this talk, he will present an ongoing work on hate speech detection in low-resource languages by transferring knowledge from a resource-rich language, English, in a zero-shot learning fashion. He will present experiments with traditional and recent neural architectures, and propose two joint-learning models, using different multilingual language representations to transfer knowledge between pairs of languages. The results of the experiment highlight a number of challenges and issues in this particular task.

One of the main challenges is related to the issue of current benchmarks for hate speech detection, in particular how bias related to the topical focus in the datasets influences the classification performance. The insufficient ability of current multilingual language models to transfer knowledge between languages in the specific hate speech detection task also remains an open problem. However, the experimental evaluation and the qualitative analysis show how the explicit integration of linguistic knowledge from a structured abusive language lexicon helps to alleviate this issue.

When: On 29th January, 2021 at 11.30 am

Categories
Meetings

VALICO-UD

Elisa Di Nuovo presents a new resource for NLP “VALICO-UD”.

Title: VALICO-UD, an Italian Learner Treebank in Universal Dependencies for NLP tasks

In this talk, a novel parallel treebank made of texts written by learners of Italian and their grammatically corrected versions will be presented. The treebank is annotated according to Universal Dependencies formalism and is composed of a silver standard (automatically parsed) and a core gold standard which was manually corrected and error annotated. In addition, the evaluation of three different UDPipe models will be presented, measuring also the impact of gold tokenisation and PoS tagging. To conclude, its applications and annotation choices will be discussed.

Paper: Towards an Italian Learner Treebank in Universal Dependencies

When: On 15th January, 2021 at 11.30 am