17th Workshop on Multiword Expressions (MWE 2021)

Colocated with ACL-IJCNLP 2021 (Bangkok, Thailand Online), 6 August 2021

Organised and sponsored by:
Special Interest Group on the Lexicon (SIGLEX) of the Association for Computational Linguistics (ACL)


MWE 2021 group picture

Contents on this page


Friday August 6th 2021, All times CEST (Central European Summer Time)

14:00-14:10   Welcome and Preparation
  Zoom live session [slides]
  Session 1: Long Papers (1h40min)
  Chair: Preslav Nakov — Zoom live session
14:10-14:30 Where do aspectual variants of light verb constructions belong?
  Aggeliki Fotopoulou, Eric Laporte and Takuya Nakamura
14:30-14:50 Data-driven Identification of Idioms in Song Lyrics
  Miriam Amin, Peter Fankhauser, Marc Kupietz and Roman Schneider
14:50-15:10 Transforming Term Extraction: Transformer-Based Approaches to Multilingual Term Extraction Across Domains
  Christian Lang, Lennart Wachowiak, Barbara Heinisch, Dagmar Gromann
  [ACL Findings paper][slides]
15:10-15:30 Contextualized Embeddings Encode Monolingual and Cross-lingual Knowledge of Idiomaticity
  Samin Fakharian and Paul Cook
15:30-15:50 PIE: A Parallel Idiomatic Expression Corpus for Idiomatic Sentence Generation and Paraphrasing
  Jianing Zhou, Hongyu Gong and Suma Bhat
15:50 - 16:05 Break (15 min)
  Session 2: Invited talk (60min)
  Chair: Paul Cook — Zoom live session
16:05- 17:05 A long hard look at MWEs in the age of Language Models
  Vered Shwartz
17:05- 17:20 Break (15 min)
  Session 3: Short Papers (45min)
  Chair: Vered Shwartz — Zoom live session
17:20-17:35 Lexical Semantic Recognition
  Nelson F. Liu, Daniel Hershcovich, Michael Kranzlein and Nathan Schneider
17:35-17:50 Finding BERT’s Idiomatic Key
  Vasudevan Nedumpozhimana and John Kelleher
17:50-18:05 Light verb constructions and their families - A corpus study on German ‘stehen unter’-LVCs
  Jens Fleischhauer
18:05- 18:20 Break (15min)
18:20-19:00 Session 4: Joint panel with Workshop on Online Abuse and Harms (40min)
  Chair: Jelena Mitrović and Bertie Vidgen
  ACL 2021’s Gather.town → Thematic spaces → D&I rooms/Thematic Spaces → Machine Translation
19:00- 19:20 Session 5: Section reporting and community discussion (20min)
  Chair: Carlos Ramisch — Zoom live session open to all MWE Section members
  [slides] [SemEval task 2 slides]

Keynote talk

A Long hard look at MWEs in the age of Language Models
Vered Shwartz

Now available as a pre-recorded video!

Bio: Vered Shwartz is a postdoctoral researcher at the Allen Institute for AI (AI2) and the University of Washington. She will join the Department of Computer Science at the University of British Columbia as an Assistant Professor in fall 2021. Her research interests include computational semantics and pragmatics, multiword expressions, and commonsense reasoning.

Abstract: In recent years, language models (LMs) have become almost synonymous with NLP. Pre-trained to “read” a large text corpus, such models are useful as both a representation layer as well as a source of world knowledge. But how well do they represent MWEs? This talk will discuss various problems in rep-resenting MWEs, and the extent to which LMs address them:

Accepted papers

This year, we received 19 submissions, among which 7 were accepted for presentation. The overall acceptance rate was 36%. The proceedings are available at ACL anthology.


Multiword expressions (MWEs) are word combinations which exhibit lexical, syntactic, semantic, pragmatic and/or statistical idiosyncrasies (Baldwin & Kim 2010), such as by and large, hot dog, pay a visit and pull one's leg. The notion encompasses closely related phenomena: idioms, compounds, light-verb constructions, rhetorical figures, institutionalised phrases, collocations, etc. The behaviour of MWEs is often unpredictable, in particular their meanings are not regularly composed of the meanings of their parts. Thus, MWEs are a major challenge in computational linguistics (Constant et al. 2017), including linguistic modelling (e.g. treebanking), computational modelling (e.g. parsing), and end-user NLP applications (e.g. natural language understanding, machine translation, and social media mining).

Modelling and processing MWEs for NLP has been the topic of the MWE workshop organised by the MWE section of SIGLEX in conjunction with major NLP conferences since 2003. Although much progress has been made in the field, MWE processing in end-user NLP tasks is currently under-explored, and most studies still introduce MWEs as future work. Nonetheless, there are recent studies in which MWEs gained particular attention in end-user applications, including machine translation (Zaninello & Birch 2020), text simplification (Kochmar et al. 2020, Liu & Hwa 2016), language learning and assessment (Paquot et al. 2019, Christiansen & Arnon 2017), social media mining (Maisto et al. 2017), and abusive language detection (Zampieri et al. 2020, Caselli et al. 2020).

The special focus for this 17th edition of the workshop is on MWE processing in end-user applications such as those listed above. On the one hand, the PARSEME shared tasks (Ramisch et al. 2020, Ramisch et al. 2018, Savary et al. 2017), among others, fostered significant progress in MWE identification, providing datasets, evaluation measures and tools that now allow fully integrating MWE identification into end-user applications. On the other hand, NLP seems to be shifting towards end-to-end neural models capable of solving complex end-user tasks with little or no intermediary linguistic symbols, questioning the extent to which MWEs should be implicitly or explicitly modelled. Therefore, one goal of this workshop is to bring together and encourage researchers in various NLP subfields to submit MWE-related research, so that approaches that deal with MWEs in various applications could benefit from each other.

Following the success of previous joint workshops LAW-MWE-CxG 2018, MWE-WN 2019 and MWE-LEX 2020, we further extend the scope of the workshop to MWEs in e-lexicons and WordNets, MWE annotation, as well as grammatical constructions.

The 17th Workshop on MWEs invites submissions on (but not limited to) the following topics:

Traditional MWE topics:

Topics on MWEs and end-user applications:

Joint session with WOAH Workshop

Pursuing the MWE Section’s tradition of synergies with other communities and in accordance with ACL-IJCNLP 2021’s theme track on NLP for social good, we will organise a joint session with the Workshop on Online Abuse and Harm (WOAH). We believe that MWEs are important in online abuse detection, and that the latter can provide an interesting testbed for MWE processing technology. The main goal is to pave the way towards the creation of data for a shared task involving both communities. The format of the session is under discussion, and we welcome suggestions from the community. Submissions describing research on MWEs and abusive language, especially introducing new datasets, are also welcome.

Submission modalities

In regular research papers, the reported research should be substantially original. Papers available as preprints can also be submitted provided that they fulfill the conditions defined by the ACL Policies for Submission, Review and Citation. Notice that double submission to ACL-IJCNLP 2021 main conference and MWE 2021 is allowed but should be notified at submission time, as per the ACL-IJCNLP 2021 call for papers: "[…] papers can be dual-submitted to both ACL-IJCNLP 2021 and an ACL-IJCNLP 2021 workshop which has its submission deadline falling before our notification date of May 5, 2021."

The decisions as to oral or poster presentations of the selected papers will be taken by the PC chairs, depending on the available infrastructure for participation (presential and/or virtual). No distinction between papers presented orally and as posters is made in the workshop proceedings.

Instructions for authors

Submission is double-blind as per the ACL-IJCNLP 2021 guidelines. Submissions should adhere to the ACL Author Guidelines. For all types of submission, the ACL-IJCNLP 2021 templates must be used. There is no limit on the number of reference pages. An extra page will be allowed to take the reviewers' comments into account in the final versions of accepted papers (long = 9 content pages, short = 5 content pages).

The PMWE book series editors have put forward a list of conventions to cite multilingual MWE examples and a checklist for PMWE authors. Parts of the checklist are specific to PMWE authors, but sections like Terms, abbreviations and spelling can be relevant for MWE 2021 submissions. We encourage authors to adopt these conventions whenever relevant, without enforcing them. We hope that, in the long term, these could become widely adopted standards in the community.

All papers should be submitted via the workshop’s START space. Please choose the appropriate submission modality (long/short):


Important dates

All deadlines are at 23:59 UTC-12 (anywhere in the world).


The MWE workshop is organized by the SIGLEX-MWE section.

Program committee

See the full list


For any inquiries regarding the workshop please send an email to mweworkshop2021@gmail.com

Please register to SIGLEX and check the “MWE Section” box to be registered to our mailing list.

Anti-harassment policy

The workshop supports the ACL anti-harassment policy.