21st Workshop on Multiword Expressions (MWE 2025)
Colocated with: NAACL-2025, Albuquerque, New Mexico, U.S.A.
Date of the Workshop: May 4, 2025
Organised and sponsored by:
The Special Interest Group on the Lexicon (SIGLEX) of the Association for Computational Linguistics (ACL), SIGLEX’s Multiword Expressions Section (SIGLEX-MWE).
News
Contents on this page
TBD
TBD
Nathan Schneider (Georgetown University)
Bio: Nathan Schneider is a computational linguist. As Associate Professor of Linguistics and Computer Science at Georgetown University, he leads the NERT lab, looking for synergies between practical language technologies and the scientific study of language, with an emphasis on how words, grammar, and context conspire to convey meaning. He is the recipient of an NSF CAREER award to study NLP vis-à-vis metalinguistic enterprises like language learning, linguistics, and legal interpretation. Recently, he has weighed in on specific interpretive debates in U.S. law; one of these analyses was cited by U.S. Supreme Court justices in a major firearms case. He is active in the NLP community—especially ACL’s SIGANN and SIGLEX—and the Universal Dependencies project; and cofounded the SOLID forum for empirical research on legal interpretation. Prior to Georgetown, he inhabited UC Berkeley, Carnegie Mellon University, and the University of Edinburgh. Apart from annotation scheming and computational modeling, he enjoys classical music and chocolate chip cookies
Title: Meaning Construction at the Syntax-Lexis Nexus
Abstract: When words and grammar come into contact, things sometimes get messy: idiosyncratic expressions and patterns disobey ordinary principles of regularity and compositionality. A useful point of reference is the theoretical perspective of Construction Grammar, which exhorts us to view linguistic knowledge in terms of form-function mappings—at all levels of granularity. How can this perspective inform a broad-coverage, multilingual approach to lexicosyntactic conundrums? First, I will discuss implications for corpus annotation: while some multiword expressions and names (e.g. “at least”, “in order to”, “Chapter 1”) test the limits of categorical annotation standards like Universal Dependencies, UD treebanks nevertheless enable empirical investigation of some functionally-defined constructions across languages. Second, I will discuss efforts to interpret the latent representations of constructional form and meaning in transformer language models, with the NPN construction (noun-preposition-noun, as in “face to face”) as a case study.
To attend the workshop (either in person or virtually), please register through NAACL 2025’s registration system. Note that to attend MWE 2025, it is sufficient to select this workshop during registration; you do not have to register for the main conference.
Multiword expressions (MWEs), i.e., word combinations that exhibit lexical, syntactic, semantic, pragmatic, and/or statistical idiosyncrasies (Baldwin and Kim, 2010), such as “by and large”, “hot dog”, “make a decision” and “break one’s leg” are still a pain in the neck for Natural Language Processing (NLP). The notion encompasses closely related phenomena: idioms, compounds, light-verb constructions, phrasal verbs, rhetorical figures, collocations, institutionalized phrases, etc. Given their irregular nature, MWEs often pose complex problems in linguistic modeling (e.g. annotation), NLP tasks (e.g. parsing), and end-user applications (e.g. natural language understanding and Machine Translation), hence still representing an open issue for computational linguistics (Constant et al., 2017).
For more than two decades, modelling and processing MWEs for NLP has been the topic of the MWE workshop organised by the MWE section of SIGLEX in conjunction with major NLP conferences since 2003. Impressive progress has been made in the field, but our understanding of MWEs still requires much research considering their need and usefulness in NLP applications. This is also relevant to domain-specific NLP pipelines that need to tackle terminologies most often realised as MWEs. Following previous years, for this 21st edition of the workshop, we identified the following topics on which contributions are particularly encouraged:
- MWE processing to enhance end-user applications: MWEs gained particular attention in end-user applications, including Machine Translation (MT) (Zaninello and Birch, 2020), simplification (Kochmar et al., 2020), language learning and assessment (Paquot et al., 2020), social media mining (Pelosi et al., 2017), and abusive language detection (Zampieri et al. 2020). We believe that it is crucial to extend and deepen these first attempts to integrate and evaluate MWE technology in these and further end-user applications.
- MWE processing and identification in the general language, as well as in specialized languages and domains: Multiword terminology extraction from domain-specific corpora (Lossio-Ventura et al, 2014) is of particular importance to various applications, such as MT (Semmar and Laib, 2017), or for the identification and monitoring of neologisms and technical jargon (Chatzitheodorou and Kappatos, 2021).
- MWE processing in low-resource languages: The PARSEME shared tasks (2017, 2018, 2020) among others, have fostered significant progress in MWE identification, providing datasets that include low-resource languages, evaluation measures, and tools that now allow fully integrating MWE identification into end-user applications. There are continuous efforts in this direction (Diaz Hernandez, 2024) and a few of them have also explored methods for the automatic interpretation of MWEs (Bhatia et al., 2018), and their processing in low-resource languages (Eder et al., 2021). Resource creation and sharing should be pursued in parallel with the development of multilingual benchmarks for MWE identification (Savary et al., 2023).
- MWE identification and interpretation in LLMs: Most current MWE processing is limited to their identification and detection using pre-trained language models, but we still lack understanding about how MWEs are represented and dealt with therein (Garcia et al., 2021), how to better model the compositionality of MWEs from semantics (Phelps et al., 2024). Now that NLP has shifted towards end-to-end neural models like BERT, capable of solving complex tasks with little or no intermediary linguistic symbols, questions arise about the extent to which MWEs should be implicitly or explicitly modelled (Shwartz and Dagan, 2019).
- New and enhanced representation of MWEs in language resources and computational models of compositionality as gold standards for formative intrinsic evaluation.
Through this workshop, we will bring together and encourage researchers in various NLP subfields to submit their MWE-related research . We also intend to consolidate the converging results of previous joint workshops LAW-MWE-CxG 2018, MWE-WN 2019 and MWE-LEX 2020, the joint MWE-WOAH panel in 2021, the MWE-SIGUL 2022 joint session, and the MWE-UD 2024, extending our scope to MWEs in e-lexicons and WordNets, MWE annotation, as well as grammatical constructions. Correspondingly, we call for papers on research related (but not limited) to MWEs and constructions in:
- Computationally-applicable theoretical work in psycholinguistics and corpus linguistics;
- Annotation (expert, crowdsourcing, automatic) and representation in resources such as corpora, treebanks, e-lexicons, WordNets, constructions (also for low-resource languages);
- Processing in syntactic and semantic frameworks (e.g. CCG, CxG, HPSG, LFG, TAG, UD, etc.);
- Discovery and identification methods, including for specialized languages and domains such as clinical or biomedical NLP;
- Interpretation of MWEs and understanding of text containing them;
- Language acquisition, language learning, and non-standard language (e.g. tweets, speech);
- Evaluation of annotation and processing techniques;
- Retrospective comparative analyses from the PARSEME shared tasks;
- Processing for end-user applications (e.g. MT, NLU, summarisation, language learning, etc.);
- Implicit and explicit representation in pre-trained language models and end-user applications;
- Evaluation and probing of pre-trained language models;
- Resources and tools (e.g. lexicons, identifiers) and their integration into end-user applications;
- Multiword terminology extraction;
- Adaptation and transfer of annotations and related resources to new languages and domains including low-resource ones.
The workshop invites two types of submissions:
- Archival submissions that present substantially original research in both long paper format (8 pages + references) and short paper format (4 pages + references)
- Non-archival submissions of abstracts describing relevant research presented/published elsewhere which will not be included in the MWE proceedings.
Papers should be submitted via the OpenReview submission page. Please choose the appropriate submission format (archival/non-archival). Archival papers with existing reviews will also be accepted through the ACL Rolling Review. Submissions must follow the ACL stylesheet. For further information on this initiative, please refer to NAACL 2025
The ARR (pre-reviewed)’s paper can be committed here.
TBD
What |
When |
Paper submission deadline |
February 13, 2025 |
ARR commitment deadline |
February 27, 2025 |
Notification of acceptance |
March 8, 2025 |
Camera-ready papers due |
March 17, 2025 |
Underline upload deadline |
April 8, 2025 |
Workshop |
May 04, 2025 |
All deadlines are at 23:59 UTC-12 (Anywhere on Earth).
A. Seza Doğruöz | Ghent University, Belgium |
Alexandre Rademaker | FGV/EMA, Brazil |
Atul Kr. Ojha | Insight Research Ireland Centre for Data Analytics, University of Galway |
Gražina Korvel | VU Institute of Data Science and Digital Technologies |
Mathieu Constant | Université de Lorraine |
Verginica Barbu Mititelu | Romanian Academy Research Institute for Artificial Intelligence |
Voula Giouli | Institute for Language & Speech Processing, ATHENA RC, Greece |
Agata Savary |
Université Paris-Saclay |
Beata Trawinski |
Leibniz Institute for the German Language |
Carlos Ramisch |
LIS - Laboratoire d’Informatique et Systèmes |
Chikara Hashimoto |
Rakuten Institute of Technology |
Cvetana Krstev |
University of Belgrade, Faculty of Philology |
Eric G C Laporte |
Université Gustave Eiffel |
Francis Bond |
Palacký University Olomouc |
Gaël Dias |
University of Caen Normandy |
Gražina Korvel |
Vilnius University |
Irina Lobzhanidze |
Ilia Chavchavadze State University |
Ismail El Maarouf |
Imprevicible |
Ivelina Stoyanova |
Deaf Studies Institute |
Jan Odijk |
Utrecht University |
John Philip McCrae |
National University of Ireland Galway |
Kenneth Church |
Northeastern University |
Manfred Sailer |
Johann Wolfgang Goethe Universität Frankfurt am Main |
Mathieu Constant |
Université de Lorraine, CNRS, ATILF |
Matthew Shardlow |
The Manchester Metropolitan University |
Meghdad Farahmand |
University of Genoa |
Miriam Butt |
Universität Konstanz |
Paul Cook |
University of New Brunswick |
Pavel Pecina |
Charles University |
Petya Osenova |
Sofia University “St. Kliment Ohridski” |
Ranka Stanković |
University of Belgrade |
Sabine Schulte im Walde |
University of Stuttgart |
Shiva Taslimipoor |
University of Cambridge |
Stan Szpakowicz |
University of Ottawa |
Stella Markantonatou |
ATHENA RIC |
Tiberiu Boros |
Adobe Systems |
Tunga Gungor |
Bogazici University |
The workshop follows the ACL anti-harassment policy.
For any inquiries regarding the workshop, please send an email to the Organizing Committee at mwe2025workshop@gmail.com.
Please register to SIGLEX and check the “MWE
Section” box to be registered to our mailing list.