Colocated with ACL-IJCNLP 2021 (Bangkok, Thailand Online), 6 August 2021
Organised and sponsored by:
Special Interest Group on the Lexicon (SIGLEX) of the Association for Computational Linguistics (ACL)
Friday August 6th 2021, All times CEST (Central European Summer Time)
A Long hard look at MWEs in the age of Language Models
Bio: Vered Shwartz is a postdoctoral researcher at the Allen Institute for AI (AI2) and the University of Washington. She will join the Department of Computer Science at the University of British Columbia as an Assistant Professor in fall 2021. Her research interests include computational semantics and pragmatics, multiword expressions, and commonsense reasoning.
Abstract: In recent years, language models (LMs) have become almost synonymous with NLP. Pre-trained to “read” a large text corpus, such models are useful as both a representation layer as well as a source of world knowledge. But how well do they represent MWEs? This talk will discuss various problems in rep-resenting MWEs, and the extent to which LMs address them:
This year, we received 19 submissions, among which 7 were accepted for presentation. The overall
acceptance rate was 36%. The proceedings are available at ACL anthology.
Multiword expressions (MWEs) are word combinations which exhibit lexical, syntactic, semantic, pragmatic and/or statistical idiosyncrasies (Baldwin & Kim 2010), such as by and large, hot dog, pay a visit and pull one's leg. The notion encompasses closely related phenomena: idioms, compounds, light-verb constructions, rhetorical figures, institutionalised phrases, collocations, etc. The behaviour of MWEs is often unpredictable, in particular their meanings are not regularly composed of the meanings of their parts. Thus, MWEs are a major challenge in computational linguistics (Constant et al. 2017), including linguistic modelling (e.g. treebanking), computational modelling (e.g. parsing), and end-user NLP applications (e.g. natural language understanding, machine translation, and social media mining).
Modelling and processing MWEs for NLP has been the topic of the MWE workshop organised by the MWE section of SIGLEX in conjunction with major NLP conferences since 2003. Although much progress has been made in the field, MWE processing in end-user NLP tasks is currently under-explored, and most studies still introduce MWEs as future work. Nonetheless, there are recent studies in which MWEs gained particular attention in end-user applications, including machine translation (Zaninello & Birch 2020), text simplification (Kochmar et al. 2020, Liu & Hwa 2016), language learning and assessment (Paquot et al. 2019, Christiansen & Arnon 2017), social media mining (Maisto et al. 2017), and abusive language detection (Zampieri et al. 2020, Caselli et al. 2020).
The special focus for this 17th edition of the workshop is on MWE processing in end-user applications such as those listed above. On the one hand, the PARSEME shared tasks (Ramisch et al. 2020, Ramisch et al. 2018, Savary et al. 2017), among others, fostered significant progress in MWE identification, providing datasets, evaluation measures and tools that now allow fully integrating MWE identification into end-user applications. On the other hand, NLP seems to be shifting towards end-to-end neural models capable of solving complex end-user tasks with little or no intermediary linguistic symbols, questioning the extent to which MWEs should be implicitly or explicitly modelled. Therefore, one goal of this workshop is to bring together and encourage researchers in various NLP subfields to submit MWE-related research, so that approaches that deal with MWEs in various applications could benefit from each other.
Following the success of previous joint workshops LAW-MWE-CxG 2018, MWE-WN 2019 and MWE-LEX 2020, we further extend the scope of the workshop to MWEs in e-lexicons and WordNets, MWE annotation, as well as grammatical constructions.
The 17th Workshop on MWEs invites submissions on (but not limited to) the following topics:
Traditional MWE topics:
Topics on MWEs and end-user applications:
Pursuing the MWE Section’s tradition of synergies with other communities and in accordance with ACL-IJCNLP 2021’s theme track on NLP for social good, we will organise a joint session with the Workshop on Online Abuse and Harm (WOAH). We believe that MWEs are important in online abuse detection, and that the latter can provide an interesting testbed for MWE processing technology. The main goal is to pave the way towards the creation of data for a shared task involving both communities. The format of the session is under discussion, and we welcome suggestions from the community. Submissions describing research on MWEs and abusive language, especially introducing new datasets, are also welcome.
In regular research papers, the reported research should be substantially original. Papers available as preprints can also be submitted provided that they fulfill the conditions defined by the ACL Policies for Submission, Review and Citation. Notice that double submission to ACL-IJCNLP 2021 main conference and MWE 2021 is allowed but should be notified at submission time, as per the ACL-IJCNLP 2021 call for papers: "[…] papers can be dual-submitted to both ACL-IJCNLP 2021 and an ACL-IJCNLP 2021 workshop which has its submission deadline falling before our notification date of May 5, 2021."
The decisions as to oral or poster presentations of the selected papers will be taken by the PC chairs, depending on the available infrastructure for participation (presential and/or virtual). No distinction between papers presented orally and as posters is made in the workshop proceedings.
Submission is double-blind as per the ACL-IJCNLP 2021 guidelines. Submissions should adhere to the ACL Author Guidelines.
For all types of submission, the ACL-IJCNLP 2021 templates must be used. There is no limit on the number of reference pages. An extra page will be allowed to take the reviewers' comments into account in the final versions of accepted papers (long = 9 content pages, short = 5 content pages).
The PMWE book series editors have put forward a list of conventions to cite multilingual MWE examples and a checklist for PMWE authors. Parts of the checklist are specific to PMWE authors, but sections like Terms, abbreviations and spelling can be relevant for MWE 2021 submissions. We encourage authors to adopt these conventions whenever relevant, without enforcing them. We hope that, in the long term, these could become widely adopted standards in the community.
All papers should be submitted via the workshop’s START space. Please choose the appropriate submission modality (long/short):
All deadlines are at 23:59 UTC-12 (anywhere in the world).
The MWE workshop is organized by the SIGLEX-MWE section.
For any inquiries regarding the workshop please send an email to email@example.com
Please register to SIGLEX and check the “MWE Section” box to be registered to our mailing list.
The workshop supports the ACL anti-harassment policy.