• Co-Supervisor :
  • Luis MENESES-LERIN (Grammatica)
  • Funding : Artois
  • Start year :
  • 2023

Idiomatic expressions (IEs) are multiword expressions (MWEs) that typically take place as collocations where the meaning is not derivable from its word pieces. Identifying and modelling IEs is an important task for developing NLP applications, such as machine translation, sentiment analysis, and paraphrase generation, among others. The non-compositional semantics of idiomatic expressions (semantic idiomaticity) and the idiosyncrasy in their properties raise several challenges in language understanding. This is due, in particular, to their literal or figurative meanings depending on the context in which they occur (semantic ambiguity). Recently, the success of contextualised LMs such as BERT, GPT3 or OPT has led to a paradigm shift in NLP as they allow us to capture prior knowledge about word meaning, and language more generally. While LMs have achieved groundbreaking results across a wide range of NLP tasks, it is unclear to what extent such models capture figurative language in idiomatic expressions and how to perform computational metaphor generation. This thesis investigates these issues by learning suitable embeddings for idiomatic expressions that can be used in downstream applications.