21st Workshop on Multiword Expressions (MWE 2025)
Colocated with: NAACL-2025
Date of the Workshop: May 3/4, 2025
Organised and sponsored by:
The Special Interest Group on the Lexicon (SIGLEX) of the Association for Computational Linguistics (ACL), SIGLEX’s Multiword Expressions Section (SIGLEX-MWE).
News
Contents on this page
TBD
TBD
TBD
To attend the workshop (either in person or virtually), please register through NAACL 2025’s registration system. Note that to attend MWE 2025, it is sufficient to select this workshop during registration; you do not have to register for the main conference.
Multiword expressions (MWEs), i.e., word combinations that exhibit lexical, syntactic, semantic, pragmatic, and/or statistical idiosyncrasies (Baldwin and Kim, 2010), such as “by and large”, “hot dog”, “make a decision” and “break one’s leg” are still a pain in the neck for Natural Language Processing (NLP). The notion encompasses closely related phenomena: idioms, compounds, light-verb constructions, phrasal verbs, rhetorical figures, collocations, institutionalized phrases, etc. Given their irregular nature, MWEs often pose complex problems in linguistic modeling (e.g. annotation), NLP tasks (e.g. parsing), and end-user applications (e.g. natural language understanding and Machine Translation), hence still representing an open issue for computational linguistics (Constant et al., 2017).
For more than two decades, modelling and processing MWEs for NLP has been the topic of the MWE workshop organised by the MWE section of SIGLEX in conjunction with major NLP conferences since 2003. Impressive progress has been made in the field, but our understanding of MWEs still requires much research considering their need and usefulness in NLP applications. This is also relevant to domain-specific NLP pipelines that need to tackle terminologies most often realised as MWEs. Following previous years, for this 19th edition of the workshop, we identified the following topics on which contributions are particularly encouraged:
- MWE processing to enhance end-user applications: MWEs have gained particular attention in end-user applications, including Machine Translation (MT) (Zaninello and Birch, 2020), simplification (Kochmar et al., 2020), language learning and assessment (Paquot et al., 2020), social media mining (Pelosi et al., 2017), and abusive language detection (Zampieri et al. 2020). We believe that it is crucial to extend and deepen these first attempts to integrate and evaluate MWE technology in these and further end-user applications.
- MWE processing and identification in the general language, as well as in specialized languages and domains: Multiword terminology extraction from domain-specific corpora (Lossio-Ventura et al, 2014) is of particular importance to various applications, such as MT (Semmar and Laib, 2017), or for the identification and monitoring of neologisms and technical jargon (Chatzitheodorou and Kappatos, 2021).
- MWE processing in low-resourced languages: The PARSEME shared tasks (2017, 2018, 2020) among others, have fostered significant progress in MWE identification, providing datasets that include low-resource languages, evaluation measures, and tools that now allow fully integrating MWE identification into end-user applications. There are continuous efforts in this direction (Diaz Hernandez, 2024) and a few of them have also explored methods for the automatic interpretation of MWEs (Bhatia et al., 2018), and their processing in low-resource languages (Eder et al., 2021). Resource creation and sharing should be pursued in parallel with the development of multilingual benchmarks for MWE identification (Savary et al., 2023).
- MWE identification and interpretation in LLMs: Most current MWE processing is limited to their identification and detection using pre-trained language models, but we still lack understanding about how MWEs are represented and dealt with therein (Garcia et al., 2021), how to better model the compositionality of MWEs from semantics (Phelps et al., 2024). Now that NLP has shifted towards end-to-end neural models like BERT, capable of solving complex tasks with little or no intermediary linguistic symbols, questions arise about the extent to which MWEs should be implicitly or explicitly modelled (Shwartz and Dagan, 2019).
- New and enhanced representation of MWEs in language resources and computational models of compositionality as gold standards for formative intrinsic evaluation.
Through this workshop, we would like to bring together and encourage researchers in various NLP subfields to submit MWE-related research, so that approaches that deal with processing of MWEs including processing for low-resource languages and for various applications can benefit from each other. We also intend to consolidate the converging effects of previous joint workshops LAW-MWE-CxG 2018, MWE-WN 2019 and MWE-LEX 2020, the joint MWE-WOAH panel in 2021, the MWE-SIGUL 2022 joint session, and the MWE-UD 2024, extending our scope to MWEs in e-lexicons and WordNets, MWE annotation, as well as grammatical constructions. Correspondingly, we call for papers on research related (but not limited) to MWEs and constructions in:
- Computationally-applicable theoretical work in psycholinguistics and corpus linguistics;
- Annotation (expert, crowdsourcing, automatic) and representation in resources such as corpora, treebanks, e-lexicons, and WordNets (also for low-resource languages);
- Processing in syntactic and semantic frameworks (e.g. CCG, CxG, HPSG, LFG, TAG, UD, etc.);
- Discovery and identification methods, including for specialized languages and domains such as clinical or biomedical NLP;
- Interpretation of MWEs and understanding of text containing them;
- Language acquisition, language learning, and non-standard language (e.g. tweets, speech);
- Evaluation of annotation and processing techniques;
- Retrospective comparative analyses from the PARSEME shared tasks;
- Processing for end-user applications (e.g. MT, NLU, summarisation, language learning, etc.);
- Implicit and explicit representation in pre-trained language models and end-user applications;
- Evaluation and probing of pre-trained language models;
- Resources and tools (e.g. lexicons, identifiers) and their integration into end-user applications;
- Multiword terminology extraction;
- Adaptation and transfer of annotations and related resources to new languages and domains including low-resource ones.
The workshop invites two types of submissions:
- Archival submissions that present substantially original research in both long paper format (8 pages + references) and short paper format (4 pages + references)
- Non-archival submissions of abstracts describing relevant research presented/published elsewhere which will not be included in the MWE proceedings.
Papers should be submitted via the workshop’s submission page (TBD). Please choose the appropriate submission format (archival/non-archival). Archival papers with existing reviews will also be accepted through the ACL Rolling Review. Submissions must follow the ACL stylesheet. For further information on this initiative, please refer to NAACL 2025
TBD
What |
When |
Paper submission deadline |
January 30, 2025 |
ARR commitment deadline |
February 20, 2025 |
Notification of acceptance |
March 1, 2025 |
Camera-ready papers due |
March 10, 2025 |
Underline upload deadline |
April 8, 2025 |
Workshop |
May 03 or 04, 2025 |
All deadlines are at 23:59 UTC-12 (Anywhere on Earth).
A. Seza Dogruöz | Ghent University, Belgium |
Alexandre Rademaker | IBM Research, Brazil |
Atul Kr. Ojha | Insight Research Ireland Centre for Data Analytics, University of Galway |
Gražina Korvel | VU Institute of Data Science and Digital Technologies |
Mathieu Constant | Université de Lorraine |
Verginica Barbu Mititelu | Romanian Academy |
Voula Giouli | Institute for Language & Speech Processing, ATHENA RC, Greece |
Verginica Barbu Mititelu |
Romanian Academy |
Cherifa Ben Kehlil |
University of Tours |
Philippe Blache |
Aix-Marseille Uni |
Francis Bond |
Palacký University |
Claire Bonial |
U.S. Army Research Laboratory |
Julia Bonn |
University of Colorado Boulder |
Tiberiu Boroș |
Adobe |
Marie Candito |
Université Paris Cité |
Giuseppe G. A. Celano |
Leipzig Uni |
Kenneth Church |
Baidu |
Çağrı Çöltekin |
University of Tübingen |
Mathieu Constant |
Université de Lorraine |
Monika Czerepowicka |
University of Warmia and Mazury |
Daniel Dakota |
Indiana University |
Miryam de Lhoneux |
KU Leuven |
Marie-Catherine de Marneffe |
UC Louvain |
Valeria de Paiva |
Nuance |
Gaël Dias |
University of Caen Basse-Normandie |
Kaja Dobrovoljc |
University of Ljubljana |
Rafael Ehren |
Heinrich Heine University Düsseldorf |
Gülşen Eryiğit |
Istanbul Technical University |
Meghdad Farahmand |
Berlin, Germany |
Christiane Fellbaum |
Princeton University |
Jennifer Foster |
Dublin City University |
Aggeliki Fotopoulou |
Institute for Language and Speech Processing, ATHENA RC |
Stefan Th. Gries |
UC Santa Barbara & JLU Giessen |
Bruno Guillaume |
Université de Lorraine |
Tunga Gungor |
Bogaziçi University |
Eleonora Guzzi |
Universidade da Coruña |
Laura Kallmeyer |
Heinrich Heine University Düsseldorf |
Cvetana Krstev |
University of Belgrade |
Timm Lichte |
University of Tübingen |
Irina Lobzhanidze |
Ilia State University |
Teresa Lynn |
ADAPT Centre |
Stella Markantonatou |
Institute for Language & Speech Processing, ATHENA RC |
John P. McCrae |
National University of Ireland, Galway |
Nurit Melnik |
The Open University of Israel |
Johanna Monti |
“L’Orientale” University of Naples |
Dmitry Nikolaev |
University of Manchester |
Jan Odijk |
University of Utrecht |
Petya Osenova |
Bulgarian Academy of Sciences |
Yannick Parmentier |
University of Lorraine |
Agnieszka Patejuk |
University of Oxford and Institute of Computer Science, Polish Academy of Sciences |
Pavel Pecina |
Charles University |
Ted Pedersen |
University of Minnesota |
Prokopis Prokopidis |
Institute for Language and Speech Processing, ATHENA RC |
Manfred Sailer |
Goethe-Universität Frankfurt am Main |
Tanja Samardžić |
University of Zurich |
Agata Savary |
Université Paris-Saclay |
Nathan Schneider |
Georgetown University |
Sabine Schulte im Walde |
University of Stuttgart |
Sebastian Schuster |
Saarland University |
Matthew Shardlow |
University of Manchester |
Joaquim Silva |
Universidade NOVA de Lisboa |
Maria Simi |
Università di Pisa |
Ranka Stanković |
University of Belgrade |
Ivelina Stoyanova |
Bulgarian Academy of Sciences |
Stan Szpakowicz |
University of Ottawa |
Shiva Taslimipoor |
University of Cambridge |
Beata Trawinski |
Leibniz Institute for the German Language |
Ashwini Vaidya |
Indian Institute of Technology |
Marion Di Marco |
Ludwig Maximilian University of Munich |
Amir Zeldes |
Georgetown University |
Daniel Zeman |
Charles University |
The workshop follows the ACL anti-harassment policy.
For any inquiries regarding the workshop, please send an email to the Organizing Committee at mweworkshop2023@googlegroups.com.
Please register to SIGLEX and check the “MWE
Section” box to be registered to our mailing list.