The seminar is presented by Dr Dilara Torunoğlu Selamet, Lecturer in the Department of Computer Engineering at Istanbul Technical University (ITU), Türkiye.
Abstract:
Word Sense Disambiguation (WSD) is a fundamental Natural Language Processing (NLP) task that aims to identify the intended meaning of an ambiguous word in context. While substantial progress has been achieved for high-resource languages, WSD remains challenging for morphologically rich and low-resource languages such as Turkish due to the scarcity of large-scale annotated datasets.
In this talk, I will present my PhD research on Turkish Word Sense Disambiguation, which combines data-centric and model-centric approaches. First, I will introduce DodoMe, a large-scale gamified crowdsourcing platform developed to collect sense-annotated Turkish sentences. The resulting dataset contains more than 158,000 annotations covering 30 highly ambiguous Turkish words and represents one of the largest publicly available WSD resources for Turkish.
Second, I will discuss a systematic comparison of modern approaches to WSD, including contextual embedding-based methods, prompting-based inference with Large Language Models (LLMs), and instruction-based fine-tuning of open-source LLMs. The results demonstrate that recent LLMs substantially outperform traditional embedding-based approaches and that instruction tuning can further improve performance when sufficient high-quality annotated data is available.
Finally, I will discuss the broader implications of combining human computation, crowdsourcing, and large language models for developing semantic resources and language technologies for under-resourced languages.
Short Bio:
Dilara Torunoğlu Selamet is a Lecturer in the Department of Computer Engineering at Istanbul Technical University (ITU), Türkiye, and recently defended her PhD in Computer Engineering. She is a member of the ITU Natural Language Processing Research Group, and her research focuses on Natural Language Processing, lexical semantics, word sense disambiguation, meaning representation, and multilingual language technologies.
Her doctoral research investigates Turkish Word Sense Disambiguation through the integration of large-scale crowdsourced datasets and Large Language Models. She is the creator of DodoMe, a gamified crowdsourcing platform designed for collecting semantic annotations in Turkish. Her work explores contextual embeddings, prompting strategies, and instruction-based fine-tuning approaches for semantic disambiguation tasks.
Dilara is actively involved in international collaborations through the UniDive COST Action, contributing to multilingual and multimodal language technology initiatives. She serves as a language leader and coordinator in the AdMiRe (Advancing Multimodal Idiomaticity Representation) shared task series, which focuses on multilingual and multimodal idiomaticity understanding across dozens of languages.
Her recent work includes contributions to the AdMiRe shared tasks at EACL 2026 and LREC-COLING 2026, as well as research on Turkish Word Sense Disambiguation, Abstract Meaning Representation (AMR), Uniform Meaning Representation (UMR), and multilingual idiomaticity understanding. She has co-authored large-scale international publications involving researchers from more than 30 languages and actively contributes to the development of multilingual benchmarks, linguistic resources, and evaluation campaigns for Natural Language Processing.
When: 10/06/2026, h 11:00 am
Where: Sala Conferenze, third floor