In this talk, Alessandro Mazzei will present the results of a project developed with TIM about the improvement of a dialogue system in the domain of customer service for TELCO. The idea is to compensate for the lack of linguistic information by predicting the intentions of the humans on the basis of domain knowledge.
Category: Meetings
Periodic meetings of CCC group
Davide Colla presents a novel multilingual lexical resource called LessLex.
Title: LessLex: Linking Multilingual Embeddings to SenSe Representations of LEXical Items
LessLex is a novel multilingual lexical resource. Different from the vast majority of existing approaches, he grounds the embeddings on a sense inventory made available from the BabelNet semantic network. In this setting, multilingual access is governed by the mapping of terms onto their underlying sense descriptions, such that all vectors co-exist in the same semantic space. As a result, for each term there are thus the “blended” terminological vector along with those describing all senses associated to that term. LessLex has been tested on three tasks relevant to lexical semantics: conceptual similarity, contextual similarity, and semantic text similarity. He experimented over the principal data sets for such tasks in their multilingual and crosslingual variants, improving on or closely approaching state-of-the-art results. He concludes by arguing that LessLex vectors may be relevant for practical applications and for research on conceptual and lexical access and competence.
Related Paper: LessLex: Linking Multilingual Embeddings to SenSe Representations of LEXical Items
When: On 26th March at 11.30 am
Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7
NLP for Music Information Retrieval
Michael Kurt Fell presents an interesting analysis of Lyrics Structure and Content.
Title: Natural Language Processing for Music Information Retrieval: Deep Analysis of Lyrics Structure and Content
Applications in Music Information Retrieval and Computational Musicology have traditionally relied on features extracted from the music content in the form of audio, but mostly ignored the song lyrics. More recently, improvements in fields such as music recommendation have been made by taking into account external metadata related to the song. In this talk, he will demonstrate that extracting knowledge from the song lyrics is the next step to improve the user’s experience when interacting with music. To extract knowledge from vast amounts of song lyrics, he will show for different textual aspects (their structure, content, and perception) how Natural Language Processing (NLP) methods can be adapted and successfully applied to lyrics. For the structural aspect of lyrics, a structural description of it is obtained by introducing a model that efficiently segments the lyrics into its characteristic parts (e.g. intro, verse, chorus). In a second stage, the content of lyrics is represented by means of summarizing the lyrics in a way that respects the characteristic lyrics structure. Finally, on the perception of lyrics he faced the problem of detecting explicit content in a song text. This task proves to be very hard and he will show that the difficulty partially arises from the subjective nature of perceiving lyrics in one way or another depending on the context. As a consequence of this work, he has also created the annotated WASABI Song Corpus, a dataset of two million songs with NLP lyrics annotations on various levels.
When: On 26th February at 11.30 am
Where: https://unito.webex.com/webappng/sites/unito/meeting/info/910eaf7ad0534d1ba92c5dde0a66a9a7
The Ontology of Migrant Writers
Marco Antonio Stranisci presents a new Computational Ontology of Migrant Writers.
Title: The Ontology of Migrant Writers
Narratives have become a pervasive, and multifaceted presence in social media. Within these communicative contexts, journalists and other influential people use them to frame specific and often conflicting points of view on the world. Correspondingly, users are an active part of this creative process because they interact and redefine narratives through their sentiment on specific topics.
However, social media are often affected by stereotypical narratives that increase the level of aggressiveness and verbal violence online, often at the expense of people vulnerable to discrimination. Many of these narratives are mainstream and strongly related to the spreading of Hate Speech (HS). Unfortunately, similar stereotypes are also present in positive narratives, which in several cases depict people vulnerable to HS exclusively as victims. Instead, stories directly created by minorities have poor visibility in the public debate even if the social web hosts a lot of them.
In order to reduce this underrepresentation, a computational ontology of migrant writers has been developed. This resource is aimed at representing people who created literary works and are or have been migrant during their life. It will be used to collect, organize, and make publicly available knowledge about migrant writers, and their narratives. The ontology design focused on two research questions:
- how to model the concept of migrant;
- how to represent biographical events in their temporal succession.
In the presentation, he will first introduce the backbone ontology of migrant writers, highlighting the most challenging aspects he faced during its development. Then, he will show a series of data collection strategies he implemented to gather contents from Wikidata, DBpedia, and Wikipedia.
Endang Wahyu Pamungkas presents new experiments and challenges in Hate Speech Detection in a multi-lingual context.
Title: Zero-Shot Cross-Lingual Hate Speech Detection
Hate speech is an increasingly important societal issue in the era of digital communication. Hateful expressions often make use of figurative language and, although they represent, in some sense, the dark side of language, they are also often prime examples of creative use of language. While hate speech is a global phenomenon, current studies on automatic hate speech detection are typically framed in a monolingual setting.
In this talk, he will present an ongoing work on hate speech detection in low-resource languages by transferring knowledge from a resource-rich language, English, in a zero-shot learning fashion. He will present experiments with traditional and recent neural architectures, and propose two joint-learning models, using different multilingual language representations to transfer knowledge between pairs of languages. The results of the experiment highlight a number of challenges and issues in this particular task.
One of the main challenges is related to the issue of current benchmarks for hate speech detection, in particular how bias related to the topical focus in the datasets influences the classification performance. The insufficient ability of current multilingual language models to transfer knowledge between languages in the specific hate speech detection task also remains an open problem. However, the experimental evaluation and the qualitative analysis show how the explicit integration of linguistic knowledge from a structured abusive language lexicon helps to alleviate this issue.
Elisa Di Nuovo presents a new resource for NLP “VALICO-UD”.
Title: VALICO-UD, an Italian Learner Treebank in Universal Dependencies for NLP tasks
In this talk, a novel parallel treebank made of texts written by learners of Italian and their grammatically corrected versions will be presented. The treebank is annotated according to Universal Dependencies formalism and is composed of a silver standard (automatically parsed) and a core gold standard which was manually corrected and error annotated. In addition, the evaluation of three different UDPipe models will be presented, measuring also the impact of gold tokenisation and PoS tagging. To conclude, its applications and annotation choices will be discussed.
Paper: Towards an Italian Learner Treebank in Universal Dependencies