Data Augmentation through Back-Translation for Stereotypes and Irony Detection

Tom Bourgeade will present his research on “Data Augmentation through Back-Translation for Stereotypes and Irony Detection”.


In NLP, the detection of nuanced phenomena such as stereotypes or irony presents unique challenges, namely linked to the scarcity of labeled datasets. One strategy to mitigate this is to employ Data Augmentation methods, which each have their pros and cons with regard to these phenomena. This presentation will focus on Back-Translation, which proposes exploiting modern Machine Translation models to introduce variety in training instances, in a process similar to paraphrasing, by translating a text into a pivot language, then back into the original language. We compare this approach on multilingual datasets for stereotypes and irony detection, against simpler strategies such as oversampling, as well as Cross-Translation, in which instances from other language subsets are translated and injected into the target training language subset.

When: 19/04/2024 11.30

Where: Sala Conferenze (3rd floor)