Komal Florio presents an interesting investigation on the topic in the public discourse on Social Media.
Title: Topic shift in the public discourse on Social Media: a case study about the covid-19 induced lockdown in Italy in 2020
In this work she tried to tackle the challenge of measuring and quantifying the topic shift in the public discourse on Social Media, using as a case study the online debate on Twitter following the covid-19 related lockdown in Italy in 2020, by means of a dedicated filtering of TWITA, a dataset of tweets in Italian.
At first she tried to predict which messages contained hate speech using AlBERTo, BERT fine tuned on Italian social media language, but the results were far from satisfying. She then tried a lexicon based approach and found that the dominant categories were derogatory words, insults regarding moral or behavioural defects and cognitive disabilities or diversity. Nevertheless the accuracy of this classification was not very high, and analysing the words in the lexicon that determined the classification for the top 3 categories it is possible to conclude that a manual revision of the list of words per each category could improve the outcome of this task.
She then moved to the most powerful classification tool that was used on these data: topic modeling. A first classification with a Latent Dirichlet Allocation algorithm (LDA) proved valid in extracting the conversation around specific relevant events that happened in Italy in the time from between February 2020 and April 2020. To obtain consistent topics over time she then moved to a Dynamic Topic Modeling, which extracted “healthcare” and “quarantine” as consistently the predominant one in the corpus. She analyzed the peaks in documents related to this topic and to the mentioned lexicon categories and found out that they happened around the same time slices were the topics “quarantine” and “healthcare” have spikes as well, showing that the most heated debates happened around public measures that affected directly and immediately on both the collectivity (“healthcare”)and personal life (“quarantine”).
She then tried to use all the information gained so far to enhance the hate speech prediction performed by means of AlBERTo. Unfortunately this experiment did not lead to significant results due to the very small size of the resulting training dataset. Infusing deep learning model with information extracted from topic modeling sounds certainly a promising way to enhance the accuracy of hate speech prediction, but she feels like a further investigation on size and characteristics of datasets is absolutely essential to gain better results.
When: On 21st May at 11.30 am