19th Workshop on Multiword Expressions (MWE 2023)

Colocated with: EACL 2023 (Dubrovnik, Croatia)

Date of the Workshop: 6 May 2023

Organised and sponsored by:
Special Interest Group on the Lexicon (SIGLEX) of the Association for Computational Linguistics (ACL)


Proceedings and video recording

The proceedings for MWE2023 can be found here.
08:30–09:00 Registration
09:00–10:30 Session 1
09:00–09:10 Opening
  Onsite chair: Marcos Garcia / Online chair: Archna Bhatia
09:10–10:30 Oral long paper presentations
  Onsite chair: Lifeng Han / Online chair: Archna Bhatia
09:10-09:30 Are Frequent Phrases Directly Retrieved like Idioms? An Investigation with Self-Paced Reading and Language Models
  Giulia Rambelli, Emmanuele Chersoni, Marco S. G. Senaldi, Philippe Blache and Alessandro Lenci
09:30-09:50 A Survey of MWE Identification Experiments: The Devil is in the Details (online)
  Carlos Ramisch, Abigail Walsh, Thomas Blanchard and Shiva Taslimipoor
09:50-10:10 The Better Your Syntax, the Better Your Semantics? Probing Pretrained Language Models for the English Comparative Correlative (non-archival, online)
  Leonie Weissweiler, Valentin Hofmann, Abdullatif Koksal and Hinrich Schütze
10:10-10:30 Predicting Compositionality of Verbal Multiword Expressions in Persian (online)
  Mahtab Sarlak, Yalda Yarandi and Mehrnoush Shamsfard
10:30–11:15 Morning coffee break
11:15–12:50 Session 2
  Onsite chairs: Marcos Garcia, Giulia Rambelli / Online chair: Shiva Taslimipoor
11:15–12:15 Keynote: Lexical collocations: Explored a lot, still a lot more to explore
  Leo Wanner
12:15–12:30 Oral short paper presentation
  Romanian Multiword Expression Detection Using Multilingual Adversarial Training and Lateral Inhibition
  Andrei Avram, Verginica Barbu Mititelu and Dumitru-Clementin Cercel
12:30–12:50 Oral long paper presentation
  PARSEME corpus release 1.3 (online)
  Agata Savary, Cherifa Ben Khelil, Carlos Ramisch, Voula Giouli, Verginica Barbu Mititelu, Najet Hadj Mohamed, Cvetana Krstev, Chaya Liebeskind, Hongzhi Xu, Sara Stymne, Tunga Güngör, Thomas Pickard, Bruno Guillaume, Eduard Bejček, Archna Bhatia, Marie Candito, Polona Gantar, Uxoa Iñurrieta, Albert Gatt, Jolanta Kovalevskaite, Timm Lichte, Nikola Ljubešić, Johanna Monti, Carla Parra Escartín, Mehrnoush Shamsfard, Ivelina Stoyanova, Veronika Vincze and Abigail Walsh
12:50–14:15 Lunch break
14:15–15:45 Session 3 (special track)
14:15–14:45 Keynote: MWEs in Clinical NLP and Healthcare Text Analysis
  Asma Ben Abacha and Goran Nenadic
14:45–15:15 Oral short paper presentations
  Detecting Idiomatic Multiword Expressions in Clinical Terminology using Definition-Based Representation Learning
  François Remy, Alfiya Khabibullina and Thomas Demeester
  Investigating the Effects of MWE Identification in Structural Topic Modelling
  Dimitrios Kokkinakis, Ricardo Muñoz Sánchez, Sebastianus Cornelis Jacobus Bruinsma and Mia-Marie Hammarlin
15:15–15:45 Panel discussion: Multiword Expressions in Knowledge-intensive Domains: Clinical Text as a Case Study
  Asma Ben Abacha, Goran Nenadic, Stefan Schulz and Kirk Roberts
15:45–16:15 Afternoon coffee break
16:15–18:00 Session 4
  Onsite chair: Eleonora Guzzi / Online chair: Voula Giouli
16:15–17:15 Poster session
  Idioms, Probing and Dangerous Things: Towards Structural Probing for Idiomaticity in Vector Space
  Filip Klubička, Vasudevan Nedumpozhimana and John Kelleher
  Simple and Effective Multi-Token Completion from Masked Language Models (non-archival)
  Oren Kalinsky, Guy Kushilevitz, Alexander Libov and Yoav Goldberg
  Annotation of lexical bundles with discourse functions in a Spanish academic corpus
  Eleonora Guzzi, Margarita Alonso-Ramos, Marcos Garcia and Marcos García Salido
  Enriching Multiword Terms in Wiktionary with Pronunciation Information
  Lenka Bajcetic, Thierry Declerck and Gilles Sérasset
  Automatic Generation of Vocabulary Lists with Multiword Expressions
  John Lee and Adilet Uvaliyev
  A MWE lexicon formalism optimised for observational adequacy
  Adam Lion-Bouton, Agata Savary and Jean-Yves Antoine
17:15—17:45 Oral short paper presentations
  Token-level Identification of Multiword Expressions using Pre-trained Multilingual Language Models (online)
  Raghuraman Swaminathan and Paul Cook
  Graph-based multi-layer querying in Parseme Corpora (online)
  Bruno Guillaume
17:45–18:00 Closing
  Onsite chair: Marcos Garcia / Online chair: Voula Giouli

Keynote speakers

Leo Wanner (ICREA and Universitat Pompeu Fabra)

Bio: Leo Wanner is ICREA Research Professor at the Pompeu Fabra University in Barcelona, with 230+ peer reviewed publications and 10 edited volumes. He is Associate Editor of the Computational Intelligence and Frontiers in AI, Language and Computation journals and serves as regular reviewer for a number of high-profile conferences and journals on Computational Linguistics. Throughout his career, Leo worked on a number of topics in the field, including natural language generation and summarization, concept extraction, conversational agents, hate speech recognition, and, in particular, also lexical collocation identification and classification.

Title: Lexical collocations: Explored a lot, still a lot more to explore

Abstract: Lexical collocations, i.e., idiosyncratic binary lexical item combinations, have been an active research topic already for a number of years. State-of-the-art neural network models report to detect and classify specific types of lexical collocations with high accuracy, which might suggest that the problem has been solved. However, a cross-type and cross-language analysis of the results of one of these models raises several relevant research questions. In the first part of my talk, I will present our recent work on the identification and classification of lexical collocations with respect to the fine-grained taxonomy of lexical functions (LFs) in English, French, Spanish and Japanese. Drawing on the outcome of this work, I will focus, in the second part of my talk, on the comparative analysis of the “LF profiles” of English and Japanese material. In particular, I will discuss (i) how the considered LFs are distributed in the given corpora; (ii) how rich the repertoires of the LF instances are in each of them; (iii) whether the contexts of the LF instances overlap; and (iv) to what extent the “profile” of an LF correlates with the accuracy of the recognition of its instances. To conclude, I will formulate the research questions that arise from this analysis.

Asma Ben Abacha (Microsoft) and Goran Nenadic (University of Manchester)

Bios: Asma Ben Abacha is a Senior Scientist at Microsoft, with over 80 peer reviewed publications. Her research interests include Natural Language Processing, Machine Learning, Artificial Intelligence and their applications in medicine and healthcare.

Goran Nenadic is a Professor in the Department of Computer Science at University of Manchester and a Turing Fellow at the Alan Turing Institute, with more than 250 peer reviewed publications. His research interests include Natural Language Processing, text mining, and health informatics.

Title: MWEs in ClinicalNLP and Healthcare Text Analytics

Multiword expressions (MWEs) are word combinations that exhibit lexical, syntactic, semantic, pragmatic, and/or statistical idiosyncrasies (Baldwin & Kim 2010), such as by and large, hot dog, pay a visit and pull one’s leg. The notion encompasses closely related phenomena: idioms, compounds, light-verb constructions, phrasal verbs, rhetorical figures, collocations, institutionalised phrases, etc. Their behaviour is often unpredictable; for example, their meaning often does not result from the direct combination of the meanings of their parts. Given their irregular nature, MWEs often pose complex problems in linguistic modelling (e.g. annotation), NLP tasks (e.g. parsing), and end-user applications (e.g. natural language understanding and MT), hence still representing an open issue for computational linguistics (Constant et al. 2017).

For almost two decades, modelling and processing MWEs for NLP has been the topic of the MWE workshop organised by the MWE section of SIGLEX in conjunction with major NLP conferences since 2003. Impressive progress has been made in the field, but our understanding of MWEs still requires much research considering their need and usefulness in NLP applications. This is also relevant to domain-specific NLP pipelines that need to tackle terminologies most often realised as MWEs. Following previous years, for this 19th edition of the workshop, we identified the following topics on which contributions are particularly encouraged:

Through this workshop, we would like to bring together and encourage researchers in various NLP subfields to submit MWE-related research, so that approaches that deal with processing of MWEs including processing for low-resource languages and for various applications can benefit from each other. We also intend to consolidate the converging effects of previous joint workshops LAW-MWE-CxG 2018, MWE-WN 2019 and MWE-LEX 2020, the joint MWE-WOAH panel in 2021, and the MWE-SIGUL 2022 joint session, extending our scope to MWEs in e-lexicons and WordNets, MWE annotation, as well as grammatical constructions. Correspondingly, we call for papers on research related (but not limited) to MWEs and constructions in:

Shared task

We do not have a shared task this year, but a new release of the PARSEME corpus of verbal MWEs is currently underway. We encourage submission of research papers that include analyses of the new edition of the PARSEME data and improvements over the results for PARSEME 2020 shared task as well as SemEval 2022 task 2 on idiomaticity prediction.

Special track on MWEs in clinical NLP

Pursuing the MWE Section’s tradition of synergies with other communities, this year, we are organizing a joint session with the Clinical NLP workshop for shared papers/poster presentations. Since clinical texts contain an important amount of multiword expressions (e.g. medical terms or domain-specific collocations), a joint session is deemed beneficial for both communities. The goal is to foster future synergies that could address scientific challenges in the creation of resources, models and applications to deal with multiword expressions and related phenomena in the specialised domain of ClinicalNLP. Submissions describing research on MWEs in the specialized domain of ClinicalNLP, especially introducing new datasets or new tools and resources, are welcome. Papers accepted in this track will have the option to present their work in the Clinical NLP workshop at ACL 2023 as well, after being presented at MWE 2023.

Best paper award

All full papers in the workshop will be considered by the program committee for a best paper award.

Submission formats

The workshop invites two types of submissions:

Paper submission and templates

Papers should be submitted via the workshop’s START submission page. Please choose the appropriate submission format (archival/non-archival). Archival papers with existing reviews will also be accepted through the ACL Rolling Review. Submissions must follow the ACL 2023 stylesheet.

Archival papers with existing reviews from ACL Rolling Review will also be considered. A paper may not be simultaneously under review through ARR and MWE. A paper that has or will receive reviews through ARR may not be submitted for review to MWE. To commit an ARR submission, email its OpenReview forum URL (https://openreview.net/forum?id=XXXXXXXXXXX) to the organizers.

Important dates

What When
Paper submission deadline 13 20 February 2023
ARR commitment deadline 6 March 2023
Notification of acceptance 13 15 March 2023
Camera-ready papers due 27 March 2023
Underline upload deadline 11 April 2023
Workshop 6 May 2023

All deadlines are at 23:59 UTC-12 (Anywhere on Earth).

Organizing Committee

What Who
Program chairs Marcos Garcia, Voula Giouli, Shiva Taslimipoor, Lifeng Han
Publication chair Archna Bhatia
Coordination and communication chair Voula Giouli
Publicity chair Kilian Evang

Program Committee

Anti-harassment policy

The workshop follows the ACL anti-harassment policy.


For any inquiries regarding the workshop please send an email to the Organizing Committee at mweworkshop2023@googlegroups.com

Please register to SIGLEX and check the “MWE Section” box to be registered to our mailing list.