17th Workshop on Multiword Expressions (MWE 2021)
Colocated with ACL-IJCNLP 2021 (Bangkok, Thailand Online), 6 August 2021
Organised and sponsored by:
Special Interest Group on the Lexicon (SIGLEX) of the Association for Computational Linguistics (ACL)
News
- August 10, 2021: Vered Shwartz’s invited talk pre-recorded video now available.
- May 06, 2021: The ACL-IJCNLP workshop dates have been announced. MWE 2021 will take place on August 6, 2021.
- April 27, 2021: Paper submission deadline is extended again to May 3, 2021.
- April 13, 2021: Paper submission deadline is extended to April 26, 2021.
- April 08, 2021: ACL-IJCNLP’2021 organizers have decided that the conference will use complete virtual format.
Contents on this page
Friday August 6th 2021, All times CEST (Central European Summer Time)
14:00-14:10 |
Welcome and Preparation |
|
Zoom live session [slides] |
|
|
|
Session 1: Long Papers (1h40min) |
|
Chair: Preslav Nakov — Zoom live session |
14:10-14:30 |
Where do aspectual variants of light verb constructions belong? |
|
Aggeliki Fotopoulou, Eric Laporte and Takuya Nakamura |
|
[paper][slides] |
14:30-14:50 |
Data-driven Identification of Idioms in Song Lyrics |
|
Miriam Amin, Peter Fankhauser, Marc Kupietz and Roman Schneider |
|
[paper][slides] |
14:50-15:10 |
Transforming Term Extraction: Transformer-Based Approaches to Multilingual Term Extraction Across Domains |
|
Christian Lang, Lennart Wachowiak, Barbara Heinisch, Dagmar Gromann |
|
[ACL Findings paper][slides] |
15:10-15:30 |
Contextualized Embeddings Encode Monolingual and Cross-lingual Knowledge of Idiomaticity |
|
Samin Fakharian and Paul Cook |
|
[paper][slides] |
15:30-15:50 |
PIE: A Parallel Idiomatic Expression Corpus for Idiomatic Sentence Generation and Paraphrasing |
|
Jianing Zhou, Hongyu Gong and Suma Bhat |
|
[paper][slides] |
|
|
15:50 - 16:05 |
Break (15 min) |
|
|
|
Session 2: Invited talk (60min) |
|
Chair: Paul Cook — Zoom live session |
16:05- 17:05 |
A long hard look at MWEs in the age of Language Models |
|
Vered Shwartz |
|
[abstract][video] |
|
|
17:05- 17:20 |
Break (15 min) |
|
|
|
Session 3: Short Papers (45min) |
|
Chair: Vered Shwartz — Zoom live session |
17:20-17:35 |
Lexical Semantic Recognition |
|
Nelson F. Liu, Daniel Hershcovich, Michael Kranzlein and Nathan Schneider |
|
[paper][slides] |
17:35-17:50 |
Finding BERT’s Idiomatic Key |
|
Vasudevan Nedumpozhimana and John Kelleher |
|
[paper][slides] |
17:50-18:05 |
Light verb constructions and their families - A corpus study on German ‘stehen unter’-LVCs |
|
Jens Fleischhauer |
|
[paper][slides][video] |
|
|
18:05- 18:20 |
Break (15min) |
|
|
18:20-19:00 |
Session 4: Joint panel with Workshop on Online Abuse and Harms (40min) |
|
Chair: Jelena Mitrović and Bertie Vidgen |
|
ACL 2021’s Gather.town → Thematic spaces → D&I rooms/Thematic Spaces → Machine Translation |
|
|
19:00- 19:20 |
Session 5: Section reporting and community discussion (20min) |
|
Chair: Carlos Ramisch — Zoom live session open to all MWE Section members |
|
[slides] [SemEval task 2 slides] |
A Long hard look at MWEs in the age of Language Models
Vered Shwartz
Now available as a pre-recorded video!
Bio: Vered Shwartz is a postdoctoral researcher at the Allen Institute for AI (AI2) and the University of Washington. She will join the Department of Computer Science at the University of British Columbia as an Assistant Professor in fall 2021. Her research interests include computational semantics and pragmatics, multiword expressions, and commonsense reasoning.
Abstract: In recent years, language models (LMs) have become almost synonymous with NLP. Pre-trained to “read” a large text corpus, such models are useful as both a representation layer as well as a source of world knowledge. But how well do they represent MWEs? This talk will discuss various problems in rep-resenting MWEs, and the extent to which LMs address them:
- Do LMs capture the implicit relationship between constituents in compositional MWEs (from baby oil through parsley cake to cheese-burger stabbing)?
- Do LMs recognize when words are used non-literally in non-compositional MWEs (e.g. do they know whether there are fleas in the flea market)?
- Do LMs know idioms, and can they infer the meaning of new idioms from the context as humans often do?
This year, we received 19 submissions, among which 7 were accepted for presentation. The overall
acceptance rate was 36%. The proceedings are available at ACL anthology.
Multiword expressions (MWEs) are word combinations which exhibit lexical, syntactic, semantic, pragmatic and/or statistical idiosyncrasies (Baldwin & Kim 2010), such as by and large, hot dog, pay a visit and pull one's leg. The notion encompasses closely related phenomena: idioms, compounds, light-verb constructions, rhetorical figures, institutionalised phrases, collocations, etc. The behaviour of MWEs is often unpredictable, in particular their meanings are not regularly composed of the meanings of their parts. Thus, MWEs are a major challenge in computational linguistics (Constant et al. 2017), including linguistic modelling (e.g. treebanking), computational modelling (e.g. parsing), and end-user NLP applications (e.g. natural language understanding, machine translation, and social media mining).
Modelling and processing MWEs for NLP has been the topic of the MWE workshop organised by the MWE section of SIGLEX in conjunction with major NLP conferences since 2003. Although much progress has been made in the field, MWE processing in end-user NLP tasks is currently under-explored, and most studies still introduce MWEs as future work. Nonetheless, there are recent studies in which MWEs gained particular attention in end-user applications, including machine translation (Zaninello & Birch 2020), text simplification (Kochmar et al. 2020, Liu & Hwa 2016), language learning and assessment (Paquot et al. 2019, Christiansen & Arnon 2017), social media mining (Maisto et al. 2017), and abusive language detection (Zampieri et al. 2020, Caselli et al. 2020).
The special focus for this 17th edition of the workshop is on MWE processing in end-user applications such as those listed above. On the one hand, the PARSEME shared tasks (Ramisch et al. 2020, Ramisch et al. 2018, Savary et al. 2017), among others, fostered significant progress in MWE identification, providing datasets, evaluation measures and tools that now allow fully integrating MWE identification into end-user applications. On the other hand, NLP seems to be shifting towards end-to-end neural models capable of solving complex end-user tasks with little or no intermediary linguistic symbols, questioning the extent to which MWEs should be implicitly or explicitly modelled. Therefore, one goal of this workshop is to bring together and encourage researchers in various NLP subfields to submit MWE-related research, so that approaches that deal with MWEs in various applications could benefit from each other.
Following the success of previous joint workshops LAW-MWE-CxG 2018, MWE-WN 2019 and MWE-LEX 2020, we further extend the scope of the workshop to MWEs in e-lexicons and WordNets, MWE annotation, as well as grammatical constructions.
The 17th Workshop on MWEs invites submissions on (but not limited to) the following topics:
Traditional MWE topics:
- Computationally-applicable theoretical work on MWEs and constructions in psycholinguistics and corpus linguistics
- MWE and construction annotation and representation in resources such as corpora, treebanks, e-lexicons and WordNets
- Processing of MWEs and constructions in syntactic and semantic frameworks (e.g. CCG, CxG, HPSG, LFG, TAG, UD, etc.)
- Discovery and identification methods for MWEs and constructions
- MWEs and constructions in language acquisition, language learning, and non-standard language (e.g. tweets, speech)
- Evaluation of annotation and processing techniques for MWEs and constructions
- Retrospective comparative analyses from the PARSEME shared tasks on automatic identification of MWEs
Topics on MWEs and end-user applications:
- Processing of MWEs and constructions in end-user applications (e.g. MT, NLU, summarisation, social media mining, computer assisted language learning)
- Implicit and explicit representation of MWEs and constructions in end-user applications
- Evaluation of end-user applications concerning MWEs and constructions
- Resources and tools for MWEs and constructions (e.g. lexicons, identifiers) in end-user applications
Joint session with WOAH Workshop
Pursuing the MWE Section’s tradition of synergies with other communities and in accordance with ACL-IJCNLP 2021’s theme track on NLP for social good, we will organise a joint session with the Workshop on Online Abuse and Harm (WOAH). We believe that MWEs are important in online abuse detection, and that the latter can provide an interesting testbed for MWE processing technology. The main goal is to pave the way towards the creation of data for a shared task involving both communities. The format of the session is under discussion, and we welcome suggestions from the community. Submissions describing research on MWEs and abusive language, especially introducing new datasets, are also welcome.
- Long papers (8 content pages + references) should report on solid and finished research including new experimental results, resources and/or techniques.
- Short papers (4 content pages + references) should report on small experiments, focused contributions, ongoing research, negative results and/or philosophical discussion.
In regular research papers, the reported research should be substantially original. Papers available as preprints can also be submitted provided that they fulfill the conditions defined by the ACL Policies for Submission, Review and Citation. Notice that double submission to ACL-IJCNLP 2021 main conference and MWE 2021 is allowed but should be notified at submission time, as per the ACL-IJCNLP 2021 call for papers: "[…] papers can be dual-submitted to both ACL-IJCNLP 2021 and an ACL-IJCNLP 2021 workshop which has its submission deadline falling before our notification date of May 5, 2021."
The decisions as to oral or poster presentations of the selected papers will be taken by the PC chairs, depending on the available infrastructure for participation (presential and/or virtual). No distinction between papers presented orally and as posters is made in the workshop proceedings.
Submission is double-blind as per the ACL-IJCNLP 2021 guidelines. Submissions should adhere to the ACL Author Guidelines.
For all types of submission, the ACL-IJCNLP 2021 templates must be used. There is no limit on the number of reference pages. An extra page will be allowed to take the reviewers' comments into account in the final versions of accepted papers (long = 9 content pages, short = 5 content pages).
The PMWE book series editors have put forward a list of conventions to cite multilingual MWE examples and a checklist for PMWE authors. Parts of the checklist are specific to PMWE authors, but sections like Terms, abbreviations and spelling can be relevant for MWE 2021 submissions. We encourage authors to adopt these conventions whenever relevant, without enforcing them. We hope that, in the long term, these could become widely adopted standards in the community.
All papers should be submitted via the workshop’s START space. Please choose the appropriate submission modality (long/short):
https://www.softconf.com/acl2021/w14_MWE2021/
All deadlines are at 23:59 UTC-12 (anywhere in the world).
April 19, 2021: Paper Submission Deadline
April 26, 2021: EXTENDED Paper Submission Deadline
May 3, 2021: RE-EXTENDED Paper Submission Deadline (there will be no further extension)
May 28, 2021: Notification of Acceptance
June 7, 2021: Camera-ready papers due
August 6, 2021: Workshop
The MWE workshop is organized by the SIGLEX-MWE section.
See the full list
- Margarita Alonso-Ramos, Universidade da Coruña (Spain)
- Tim Baldwin, University of Melbourne (Australia)
- Verginica Barbu Mititelu, Romanian Academy (Romania)
- Fabienne Cap, Uppsala University (Sweden)
- Anastasia Christofidou, Academy of Athens (Greece)
- Ken Church, IBM Research (USA)
- Matthieu Constant, Université de Lorraine (France)
- Monika Czerepowicka, University of Warmia and Mazury (Poland)
- Myriam de Lhonneux, University of Copenhagen (Denmark)
- Gaël Dias, University of Caen Basse-Normandie (France)
- Meghdad Farahmand, University of Geneva (Switzerland)
- Christiane Fellbaum, Princeton University (USA)
- Joaquim Ferreira da Silva, New University of Lisbon (Portugal)
- Karën Fort, Sorbonne Université (France)
- Aggeliki Fotopoulou, ILSP/RC “Athena” (Greece)
- Marcos Garcia, University of Santiago de Compostela (Spain)
- Voula Giouli, Institute for Language and Speech Processing (Greece)
- Stefan Th. Gries, University of California (USA)
- Bruno Guillaume, Université de Lorraine (France)
- Chikara Hashimoto, Yahoo!Japan (Japan)
- Uxoa Iñurrieta, University of the Basque Country (Spain)
- Diptesh Kanojia, IIT Bombay (India)
- Elma Kerz, RWTH Aachen (Germany)
- Ekaterina Kochmar, University of Cambridge (UK)
- Dimitrios Kokkinakis, University of Gothenburg (Sweden)
- Ioannis Korkontzelos, Edge Hill University (UK)
- Cvetana Krstev, University of Belgrade (Serbia)
- Eric Laporte, University Paris-Est Marne-la-Vallee (France)
- Timm Lichte, University of Duesseldorf (Germany)
- Teresa Lynn, ADAPT Centre (Ireland)
- Stella Markantonatou, Institute for Language and Speech Processing (Greece)
- Yuji Matsumoto, Nara Institute of Science and Technology (Japan)
- Nurit Melnik, The Open University of Israel (Israel)
- Laura A. Michaelis, University of Colorado Boulder (USA)
- Johanna Monti, “L’Orientale” University of Naples (Italy)
- Preslav Nakov, Qatar Computing Research Institute, HBKU (Qatar)
- Malvina Nissim, University of Groningen (Netherlands)
- Diarmuid Ó Séaghdha, University of Cambridge (UK)
- Jan Odijk, University of Utrecht (Netherlands)
- Haris Papageorgiou, Institute for Language and Speech Processing (Greece)
- Marie-Sophie Pausé, independent researcher (France)
- Pavel Pecina, Charles University (Czech Republic)
- Ted Pedersen, University of Minnesota (USA)
- Scott Piao, Lancaster University (UK)
- Maciej Piasecki, Wroclaw University of Technology (Poland)
- Alain Polguère, Université de Lorraine (France)
- Matīss Rikters, University of Tokyo (Japan)
- Fatiha Sadat, Université du Québec à Montréal (Canada)
- Manfred Sailer, Goethe-Universität Frankfurt am Main (Germany)
- Magali Sanches Duran, University of São Paulo (Brazil)
- Branislava Šandrih, University of Belgrade (Serbia)
- Agata Savary, Université François Rabelais Tours (France)
- Sabine Schulte im Walde, University of Stuttgart (Germany)
- Matthew Shardlow, Manchester Metropolitan University (UK)
- Vered Shwartz, Allen AI (USA)
- Gyri Smørdal Losnegaard, University of Bergen (Norway)
- Ranka Stanković, University of Belgrade (Serbia)
- Ivelina Stoyanova, Bulgarian Academy of Sciences (Bulgaria)
- Stan Szpakowicz, University of Ottawa (Canada)
- Carole Tiberius, Dutch Language Institute (Netherlands)
- Beata Trawinski, Institut für Deutsche Sprache Mannheim (Germany)
- Ruben Urizar, University of the Basque Country (Spain)
- Aline Villavicencio, Federal University of Rio Grande do Sul (Brazil)
- Veronika Vincze, Hungarian Academy of Sciences (Hungary)
- Martin Volk, University of Zürich (Switzerland)
- Zeerak Waseem, University of Sheffield (UK)
- Eric Wehrli, University of Geneva (Switzerland)
- Seid Muhie Yimam, Universität Hamburg (Germany)
For any inquiries regarding the workshop please send an email to mweworkshop2021@gmail.com
Please register to SIGLEX and check the “MWE Section” box to be registered to our mailing list.
The workshop supports the ACL anti-harassment policy.