SIGLEX-MWE Section - Multiword Expressions Workshop 2023

19th Workshop on Multiword Expressions (MWE 2023)

Colocated with: EACL 2023 (Dubrovnik, Croatia)

Date of the Workshop: 6 May 2023

Organised and sponsored by:
Special Interest Group on the Lexicon (SIGLEX) of the Association for Computational Linguistics (ACL)

@multiword

News

13 Apr 2023: Detailed tentative schedule online
22 Mar 2023: Invited speakers Asma Ben Abacha and Goran Nenadic confirmed
8 Feb 2023: Paper submission deadline extended to 20 February 2023
2 Feb 2023: Final CfP posted
19 January 2023: Invited speaker Leo Wanner confirmed
16 January 2023: Second CfP posted
23 December 2022: First CfP posted
9 December 2022: MWE 2023 proposal accepted to EACL 2023
2 November 2022: Organising committee formed

Contents on this page

Proceedings and video recording
Program
Keynote speakers
Description
Shared task
Special Track on MWEs in Clinical NLP
Best paper award
Submission formats
Paper submission and templates
Important dates
Organizing Committee
Program Committee
Anti-harassment policy
Contact

Proceedings and video recording

The proceedings and video recordings for MWE2023 can be found here.

Program

Time
08:30–09:00	Registration
09:00–10:30	Session 1
09:00–09:10	Opening
	Onsite chair: Marcos Garcia / Online chair: Archna Bhatia
09:10–10:30	Oral long paper presentations
	Onsite chair: Lifeng Han / Online chair: Archna Bhatia
09:10-09:30	Are Frequent Phrases Directly Retrieved like Idioms? An Investigation with Self-Paced Reading and Language Models
	Giulia Rambelli, Emmanuele Chersoni, Marco S. G. Senaldi, Philippe Blache and Alessandro Lenci
	[paper] [slides] [video]
09:30-09:50	A Survey of MWE Identification Experiments: The Devil is in the Details (online)
	Carlos Ramisch, Abigail Walsh, Thomas Blanchard and Shiva Taslimipoor
	[paper] [slides] [video]
09:50-10:10	The Better Your Syntax, the Better Your Semantics? Probing Pretrained Language Models for the English Comparative Correlative (non-archival, online)
	Leonie Weissweiler, Valentin Hofmann, Abdullatif Koksal and Hinrich Schütze
	[slides] [video]
10:10-10:30	Predicting Compositionality of Verbal Multiword Expressions in Persian (online)
	Mahtab Sarlak, Yalda Yarandi and Mehrnoush Shamsfard
	[paper] [slides] [video]
10:30–11:15	Morning coffee break
11:15–12:50	Session 2
	Onsite chairs: Marcos Garcia, Giulia Rambelli / Online chair: Shiva Taslimipoor
11:15–12:15	Keynote: Lexical collocations: Explored a lot, still a lot more to explore
	Leo Wanner
12:15–12:30	Oral short paper presentation
	Romanian Multiword Expression Detection Using Multilingual Adversarial Training and Lateral Inhibition
	Andrei Avram, Verginica Barbu Mititelu and Dumitru-Clementin Cercel
	[paper] [slides] [video]
12:30–12:50	Oral long paper presentation
	PARSEME corpus release 1.3 (online)
	[paper] [slides] [video]
	Agata Savary, Cherifa Ben Khelil, Carlos Ramisch, Voula Giouli, Verginica Barbu Mititelu, Najet Hadj Mohamed, Cvetana Krstev, Chaya Liebeskind, Hongzhi Xu, Sara Stymne, Tunga Güngör, Thomas Pickard, Bruno Guillaume, Eduard Bejček, Archna Bhatia, Marie Candito, Polona Gantar, Uxoa Iñurrieta, Albert Gatt, Jolanta Kovalevskaite, Timm Lichte, Nikola Ljubešić, Johanna Monti, Carla Parra Escartín, Mehrnoush Shamsfard, Ivelina Stoyanova, Veronika Vincze and Abigail Walsh
12:50–14:15	Lunch break
14:15–15:45	Session 3 (special track)
	Onsite chair: Lifeng Han / Online chair: TBD
14:15–14:45	Keynote: MWEs in Clinical NLP and Healthcare Text Analysis
	Asma Ben Abacha and Goran Nenadic
14:45–15:15	Oral short paper presentations
	Detecting Idiomatic Multiword Expressions in Clinical Terminology using Definition-Based Representation Learning
	François Remy, Alfiya Khabibullina and Thomas Demeester
	[paper] [slides] [video]
	Investigating the Effects of MWE Identification in Structural Topic Modelling
	Dimitrios Kokkinakis, Ricardo Muñoz Sánchez, Sebastianus Cornelis Jacobus Bruinsma and Mia-Marie Hammarlin
	[paper] [video]
15:15–15:45	Panel discussion: Multiword Expressions in Knowledge-intensive Domains: Clinical Text as a Case Study
	Asma Ben Abacha, Goran Nenadic, Stefan Schulz and Kirk Roberts
15:45–16:15	Afternoon coffee break
16:15–18:00	Session 4
	Onsite chair: Eleonora Guzzi / Online chair: Voula Giouli
16:15–17:15	Poster session
	Idioms, Probing and Dangerous Things: Towards Structural Probing for Idiomaticity in Vector Space
	Filip Klubička, Vasudevan Nedumpozhimana and John Kelleher
	[paper] [slides] [video]
	Simple and Effective Multi-Token Completion from Masked Language Models (non-archival)
	Oren Kalinsky, Guy Kushilevitz, Alexander Libov and Yoav Goldberg
	[slides] [video]
	Annotation of lexical bundles with discourse functions in a Spanish academic corpus
	Eleonora Guzzi, Margarita Alonso-Ramos, Marcos Garcia and Marcos García Salido
	[paper] [slides] [video]
	Enriching Multiword Terms in Wiktionary with Pronunciation Information
	Lenka Bajcetic, Thierry Declerck and Gilles Sérasset
	[paper] [slides] [video]
	Automatic Generation of Vocabulary Lists with Multiword Expressions
	John Lee and Adilet Uvaliyev
	[paper] [poster] [video]
	A MWE lexicon formalism optimised for observational adequacy
	Adam Lion-Bouton, Agata Savary and Jean-Yves Antoine
	[paper]
17:15—17:45	Oral short paper presentations
	Token-level Identification of Multiword Expressions using Pre-trained Multilingual Language Models (online)
	Raghuraman Swaminathan and Paul Cook
	[paper] [slides] [video]
	Graph-based multi-layer querying in Parseme Corpora (online)
	Bruno Guillaume
	[paper] [slides] [video]
17:45–18:00	Closing and community discussion
	Onsite chair: Marcos Garcia / Online chair: Voula Giouli
	[slides]

Bio: Leo Wanner is ICREA Research Professor at the Pompeu Fabra University in Barcelona, with 230+ peer reviewed publications and 10 edited volumes. He is Associate Editor of the Computational Intelligence and Frontiers in AI, Language and Computation journals and serves as regular reviewer for a number of high-profile conferences and journals on Computational Linguistics. Throughout his career, Leo worked on a number of topics in the field, including natural language generation and summarization, concept extraction, conversational agents, hate speech recognition, and, in particular, also lexical collocation identification and classification.

Title: Lexical collocations: Explored a lot, still a lot more to explore

Abstract: Lexical collocations, i.e., idiosyncratic binary lexical item combinations, have been an active research topic already for a number of years. State-of-the-art neural network models report to detect and classify specific types of lexical collocations with high accuracy, which might suggest that the problem has been solved. However, a cross-type and cross-language analysis of the results of one of these models raises several relevant research questions. In the first part of my talk, I will present our recent work on the identification and classification of lexical collocations with respect to the fine-grained taxonomy of lexical functions (LFs) in English, French, Spanish and Japanese. Drawing on the outcome of this work, I will focus, in the second part of my talk, on the comparative analysis of the “LF profiles” of English and Japanese material. In particular, I will discuss (i) how the considered LFs are distributed in the given corpora; (ii) how rich the repertoires of the LF instances are in each of them; (iii) whether the contexts of the LF instances overlap; and (iv) to what extent the “profile” of an LF correlates with the accuracy of the recognition of its instances. To conclude, I will formulate the research questions that arise from this analysis.

Asma Ben Abacha (Microsoft) and Goran Nenadic (University of Manchester)

Bios: Asma Ben Abacha is a Senior Scientist at Microsoft, with over 80 peer reviewed publications. Her research interests include Natural Language Processing, Machine Learning, Artificial Intelligence and their applications in medicine and healthcare.

Goran Nenadic is a Professor in the Department of Computer Science at University of Manchester and a Turing Fellow at the Alan Turing Institute, with more than 250 peer reviewed publications. His research interests include Natural Language Processing, text mining, and health informatics.

Title: MWEs in ClinicalNLP and Healthcare Text Analytics

Abstract: TBD

Description

Multiword expressions (MWEs) are word combinations that exhibit lexical, syntactic, semantic, pragmatic, and/or statistical idiosyncrasies (Baldwin & Kim 2010), such as by and large, hot dog, pay a visit and pull one’s leg. The notion encompasses closely related phenomena: idioms, compounds, light-verb constructions, phrasal verbs, rhetorical figures, collocations, institutionalised phrases, etc. Their behaviour is often unpredictable; for example, their meaning often does not result from the direct combination of the meanings of their parts. Given their irregular nature, MWEs often pose complex problems in linguistic modelling (e.g. annotation), NLP tasks (e.g. parsing), and end-user applications (e.g. natural language understanding and MT), hence still representing an open issue for computational linguistics (Constant et al. 2017).

For almost two decades, modelling and processing MWEs for NLP has been the topic of the MWE workshop organised by the MWE section of SIGLEX in conjunction with major NLP conferences since 2003. Impressive progress has been made in the field, but our understanding of MWEs still requires much research considering their need and usefulness in NLP applications. This is also relevant to domain-specific NLP pipelines that need to tackle terminologies most often realised as MWEs. Following previous years, for this 19th edition of the workshop, we identified the following topics on which contributions are particularly encouraged:

MWE processing and identification in specialized languages and domains: Multiword terminology extraction from domain-specific corpora (Bonin et al. 2010) is of particular importance to various applications, such as MT (Semmar & Laib, 2017), or for the identification and monitoring of neologisms and technical jargon (Chatzitheodorou et al, 2021). We expect approaches that deal with the processing of MWEs as well as the processing of terminology in specialised domains can benefit from each other.
MWE processing to enhance end-user applications: MWEs have gained particular attention in end-user applications, including MT (Zaninello & Birch 2020; Han et al. 2021), simplification (Kochmar et al. 2020), language learning and assessment (Paquot et al. 2019; Christiansen & Arnon 2017), social media mining (Maisto et al. 2017), and abusive language detection (Zampieri et al. 2020; Caselli et al. 2020). We believe that it is crucial to extend and deepen these first attempts to integrate and evaluate MWE technology in these and further end-user applications.
MWE identification and interpretation in pre-trained language models: Most current MWE processing is limited to their identification and detection using pre-trained language models, but we still lack understanding about how MWEs are represented and dealt with therein (Nedumpozhimana & Kelleher 2021; Garcia et al. 2021, Fakharian & Cook 2021), how to better model the compositionality of MWEs from semantics (Moreau et al. 2018). Now that NLP has shifted towards end-to-end neural models like BERT, capable of solving complex tasks with little or no intermediary linguistic symbols, questions arise about the extent to which MWEs should be implicitly or explicitly modelled (Shwartz & Dagan, 2019).
MWE processing in low-resource languages: The PARSEME shared tasks (Ramisch et al. 2020; 2018; Savary et al. 2017), among others, have fostered significant progress in MWE identification, providing datasets that include low-resource languages, evaluation measures, and tools that now allow fully integrating MWE identification into end-user applications. A few efforts have recently explored methods for the automatic interpretation of MWEs (Bhatia, et al. 2018; 2017), and their processing in low-resource languages (Liu & Wang 2020; Kumar et al. 2017). Resource creation and sharing should be pursued in parallel with the development of methods able to capitalize on small datasets (Han et al. 2020).

Through this workshop, we would like to bring together and encourage researchers in various NLP subfields to submit MWE-related research, so that approaches that deal with processing of MWEs including processing for low-resource languages and for various applications can benefit from each other. We also intend to consolidate the converging effects of previous joint workshops LAW-MWE-CxG 2018, MWE-WN 2019 and MWE-LEX 2020, the joint MWE-WOAH panel in 2021, and the MWE-SIGUL 2022 joint session, extending our scope to MWEs in e-lexicons and WordNets, MWE annotation, as well as grammatical constructions. Correspondingly, we call for papers on research related (but not limited) to MWEs and constructions in:

Computationally-applicable theoretical work in psycholinguistics and corpus linguistics;
Annotation (expert, crowdsourcing, automatic) and representation in resources such as corpora, treebanks, e-lexicons, and WordNets (also for low-resource languages);
Processing in syntactic and semantic frameworks (e.g. CCG, CxG, HPSG, LFG, TAG, UD, etc.);
Discovery and identification methods, including for specialized languages and domains such as clinical or biomedical NLP;
Interpretation of MWEs and understanding of text containing them;
Language acquisition, language learning, and non-standard language (e.g. tweets, speech);
Evaluation of annotation and processing techniques;
Retrospective comparative analyses from the PARSEME shared tasks;
Processing for end-user applications (e.g. MT, NLU, summarisation, language learning, etc.);
Implicit and explicit representation in pre-trained language models and end-user applications;
Evaluation and probing of pre-trained language models;
Resources and tools (e.g. lexicons, identifiers) and their integration into end-user applications;
Multiword terminology extraction;
Adaptation and transfer of annotations and related resources to new languages and domains including low-resource ones.

Shared task

We do not have a shared task this year, but a new release of the PARSEME corpus of verbal MWEs is currently underway. We encourage submission of research papers that include analyses of the new edition of the PARSEME data and improvements over the results for PARSEME 2020 shared task as well as SemEval 2022 task 2 on idiomaticity prediction.

Special track on MWEs in clinical NLP

Pursuing the MWE Section’s tradition of synergies with other communities, this year, we are organizing a joint session with the Clinical NLP workshop for shared papers/poster presentations. Since clinical texts contain an important amount of multiword expressions (e.g. medical terms or domain-specific collocations), a joint session is deemed beneficial for both communities. The goal is to foster future synergies that could address scientific challenges in the creation of resources, models and applications to deal with multiword expressions and related phenomena in the specialised domain of ClinicalNLP. Submissions describing research on MWEs in the specialized domain of ClinicalNLP, especially introducing new datasets or new tools and resources, are welcome. Papers accepted in this track will have the option to present their work in the Clinical NLP workshop at ACL 2023 as well, after being presented at MWE 2023.

Best paper award

All full papers in the workshop will be considered by the program committee for a best paper award.

Submission formats

The workshop invites two types of submissions:

archival submissions that present substantially original research in both long paper format (8 pages + references) and short paper format (4 pages + references).
non-archival submissions of abstracts/papers describing relevant research presented/published elsewhere (including Findings papers). These will not be included in the MWE proceedings.

Paper submission and templates

Papers should be submitted via the workshop’s START submission page. Please choose the appropriate submission format (archival/non-archival). Archival papers with existing reviews will also be accepted through the ACL Rolling Review. Submissions must follow the ACL 2023 stylesheet.

Archival papers with existing reviews from ACL Rolling Review will also be considered. A paper may not be simultaneously under review through ARR and MWE. A paper that has or will receive reviews through ARR may not be submitted for review to MWE. To commit an ARR submission, email its OpenReview forum URL (https://openreview.net/forum?id=XXXXXXXXXXX) to the organizers.

Important dates

What	When
Paper submission deadline	13 20 February 2023
ARR commitment deadline	6 March 2023
Notification of acceptance	13 15 March 2023
Camera-ready papers due	27 March 2023
Underline upload deadline	11 April 2023
Workshop	6 May 2023

All deadlines are at 23:59 UTC-12 (Anywhere on Earth).

Organizing Committee

What	Who
Program chairs	Marcos Garcia, Voula Giouli, Shiva Taslimipoor, Lifeng Han
Publication chair	Archna Bhatia
Coordination and communication chair	Voula Giouli
Publicity chair	Kilian Evang

Program Committee

Iñaki Alegria, University of the Basque Country
Margarita Alonso-Ramos, Universidade da Coruña
Tim Baldwin, University of Melbourne
Verginica Barbu Mititelu, Romanian Academy
Chris Biemann, Universität Hamburg
Alexandra Birch, University of Edinburgh
Francis Bond, Palacký University
Claire Bonial, U.S. Army Research Laboratory
Tiberiu Boroș, Adobe
Jill Burstein, Educational Testing Service
Miriam Butt , Universität Konstanz
Marie Candito, Université Paris Cité
Fabienne Cap, Uppsala University
Marine Carpuat, University of Maryland
Helena Caseli, Federal University of Sao Carlos
Anastasia Christofidou, Academy of Athens
Ken Church, Baidu
Simon Clematide, University of Zürich
Matthieu Constant, Université de Lorraine
Paul Cook, University of New Brunswick
Silvio Cordeiro, Bloomin
Monika Czerepowicka, University of Warmia and Mazury
Béatrice Daille, Nantes University
Myriam de Lhonneux, University of Copenhagen
Koenraad Desmedt, University of Bergen
Mona Diab, George Washington University
Gaël Dias, University of Caen Basse-Normandie
Rafael Ehren, Heinrich Heine University Düsseldorf
Ismail El Maarouf, Adarga Ltd
Gülşen Eryiğit, Istanbul Technical University
Meghdad Farahmand, University of Geneva
Christiane Fellbaum, Princeton University
Joaquim Ferreira da Silva, New University of Lisbon
Teresa Flera, Uni Warsaw
Karën Fort, Sorbonne Université
Aggeliki Fotopoulou, Institute for Language and Speech Processing, ATHENA RC
Daniela Gierschek, Uni Luxembourg
Stefan Th. Gries, UC Santa Barbara & JLU Giessen
Bruno Guillaume, Université de Lorraine
Dhouha Hadjmed, University of Sfax
Chikara Hashimoto, Yahoo!Japan
Christopher Hidey, Columbia University
Rebecca Hwa, University of Pittsburgh
Uxoa Iñurrieta, University of the Basque Country
Laura Kallmeyer, Heinrich Heine University Düsseldorf
Diptesh Kanojia, Surrey Institute for People-Centred AI, University of Surrey
Elma Kerz, RWTH Aachen
Ekaterina Kochmar, University of Cambridge
Dimitrios Kokkinakis, University of Gothenburg
Ioannis Korkontzelos, Edge Hill University
Iztok Kosem, Jožef Stefan” Institute
Cvetana Krstev, University of Belgrade
Tita Kyriakopoulou, University Paris-Est Marne-la-Vallee
Eric Laporte, Gustave Eiffel University
Qinyuan Li , Trinity College Dublin
Timm Lichte, University of Tübingen
Irina Lobzhanidze, Ilia State University
Teresa Lynn, Mohamed bin Zayed University of Artificial Intelligence
Gunn Inger Lyse Samdal, University of Bergen
Alfredo Maldonado, Trinity College Dublin
Stella Markantonatou, Institute for Language & Speech Processing, ATHENA RC
Yuji Matsumoto, RIKEN Center for Advanced Intelligence Project
John P. McCrae, National University of Ireland, Galway
Nurit Melnik, The Open University of Israel
Laura A. Michaelis, University of Colorado Boulder
Jelena Mitrović, University of Passau
Johanna Monti, “L’Orientale” University of Naples
Preslav Nakov, Qatar Computing Research Institute, HBKU
Stella Neumann, RWTH Aachen
Sanni Nimb, Det Denske Sprog- og Litteraturselskab
Malvina Nissim, University of Groningen
Joakim Nivre, Uppsala University
Diarmuid Ó Séaghdha, University of Cambridge
Jan Odijk, University of Utrecht
Petya Osenova , Bulgarian Academy of Sciences
Yagmur Ozturk, Grenoble Alpes University
Martha Palmer, University of Colorado Boulder
Pan Pan, School of Foreign Studies, South China Normal University
Haris Papageorgiou, Institute for Language and Speech Processing
Yannick Parmentier, University of Lorraine
Carla Parra Escartín, Iconic Translation Machines
Caroline Pasquer, University of Tours
Agnieszka Patejuk, University of Oxford and Institute of Computer Science, Polish Academy of Sciences
Marie-Sophie Pausé, independent researcher
Pavel Pecina, Charles University
Ted Pedersen, University of Minnesota
Miriam R.L Petruck , International Computer Science Institute
Scott Piao, Lancaster University
Maciej Piasecki, Wroclaw University of Technology
Prisca Piccirilli, Uni. Stuttgart
Alain Polguère, Université de Lorraine
Vinodkumar Prabhakaran, Google
Behrang QuasemiZadeh, University of Duesseldorf
Alexandre Rademaker, IBM Research Brazil and EMAp/FGV
Carlos Ramisch, Aix Marseille University
Sonia Ramotowska, Uni Amsterdam
Livy Real, americanas s.a.
Martin Riedl, University of Hamburg
Matīss Rikters, University of Tokyo
Victoria Rosén, University of Bergen
Mike Rosner, University of Malta
Fatiha Sadat, Université du Québec à Montréal
Manfred Sailer, Goethe-Universität Frankfurt am Main
Bahar Salehi, The University of Melbourne
Magali Sanches Duran, University of São Paulo
Federico Sangati, Independent researcher
Agata Savary, Université Paris-Saclay
Nathan Schneider, Georgetown University
Sabine Schulte im Walde, University of Stuttgart
Matthew Shardlow, Manchester Metropolitan University
Vered Shwartz, Allen AI
Kiril Simov, Bulgarian Academy of Sciences
Noah Smith, University of Washington
Gyri Smørdal Losnegaard, University of Bergen
Jan Šnajder, University of Zagreb
Ranka Stanković, University of Belgrade
Ivelina Stoyanova, Bulgarian Academy of Sciences
Pavel Straňák, Charles University
Stan Szpakowicz, University of Ottawa
Harish Tayyar Madabushi, University of Bath
Carole Tiberius, Dutch Language Institute
Beata Trawinski, Leibniz Institute for the German Language
Yulia Tsvetkov, Carnegie Mellon University
Zdeňka Urešová, Charles University
Ruben Urizar, University of the Basque Country
Ashwini Vaidya, Indian Institute of Technology
Lonneke van der Plas, University of Malta
Bertram Vidgen, Alan Turing Institute
Aline Villavicencio, University of Sheffield
Veronika Vincze, Hungarian Academy of Sciences
Martin Volk, University of Zürich
Zeerak Talat, Simon Fraser University
Jakub Waszczuk, University of Duesseldorf
Eric Wehrli, University of Geneva
Marion Weller-Di Marco, Ludwig Maximilian University of Munich
Seid Muhie Yimam, Universität Hamburg

Anti-harassment policy

The workshop follows the ACL anti-harassment policy.

Contact

For any inquiries regarding the workshop please send an email to the Organizing Committee at mweworkshop2023@googlegroups.com

Please register to SIGLEX and check the “MWE Section” box to be registered to our mailing list.

19th Workshop on Multiword Expressions (MWE 2023)

News

Leo Wanner (ICREA and Universitat Pompeu Fabra)

Asma Ben Abacha (Microsoft) and Goran Nenadic (University of Manchester)