SIGLEX-MWE Section - Multiword Expressions Workshop 2022

18th Workshop on Multiword Expressions (MWE 2022)

Colocated with LREC 2022 (Marseille, France)

Date of the Workshop: June 25, 2022

Organised and sponsored by:
Special Interest Group on the Lexicon (SIGLEX) of the Association for Computational Linguistics (ACL)

@multiword

News

May 31, 2022: Preliminary program online.
April 20, 2022: LREC extended the Early-Bird Registration deadline for the main conference and workshops to May 6, 2022 (23:59 UTC+2).
April 12, 2022: Paper submission deadline extended to April 17, 2022.
April 04, 2022: Paper submission deadline extended to April 12, 2022.
March 19, 2022: Final CFP posted.
March, 2022: Invited speakers at MWE 2022: Sabine Schulte im Walde (University of Stuttgart), Steven Bird (Charles Darwin University)
February 21, 2022: Second CFP posted.
December 21, 2021: First CFP posted.
December 9, 2021: MWE-2022 proposal accepted at LREC 2022.

Contents on this page

Proceedings and video recording
Program
Keynote speakers
Description
Submission modalities
Instructions for authors
Important dates
Organizers
Program committee
Contact
Anti-harassment policy

Proceedings and video recording

The proceedings for MWE2022 can be found here.
The video recording for MWE2022 can be found here.

Program

Saturday June 25th 2022, All times CEST (Central European Summer Time)
Oral sessions in Joliette, SIGUL+MWE keynote in Grand Large, posters in Phar’Club area
Zoom link on LREC virtual conference platform

09:00-09:10	Opening
	[slides]

	Session 1: Oral presentations
	Chair: Agata Savary, Online co-chair: Marcos Garcia
09:10-09:25	A General Framework for Detecting Metaphorical Collocations (short, on-site)
	Marija Brkić Bakarić, Lucia Načinović Prskalo and Maja Popović
	[paper][slides]
09:25-09:40	Improving Grammatical Error Correction for Multiword Expressions (short, on-site)
	Shiva Taslimipoor, Christopher Bryant and Zheng Yuan
	[paper][slides]
09:40-09:50	Native and Non-native Speakers’ Idiom Production: What Can Read Speech Tell Us? (non-archival, on-site)
	Jing Liu and Helmer Strik
	[slides]
09:50-10:10	An Analysis of Attention in German Verbal Idiom Disambiguation (long, online)
	Rafael Ehren, Laura Kallmeyer and Timm Lichte
	[paper][slides]
10:10-10:30	Support Verb Constructions across the Ocean Sea (long, online)
	Jorge Baptista, Nuno Mamede and Sónia Reis
	[paper][slides]

10:30 - 11:00	Coffee break

	Keynote: Sabine Schulte im Walde
11:00- 12:00	Figurative Language in Noun Compound Models across Target Properties, Domains and Time
	Chair: Carlos Ramisch, Online co-chair: Archna Bhatia
	[abstract][slides]

	Session 2: Oral presentations
	Chair: Harish Tayyar Madabushi, Online co-chair: Archna Bhatia
12:00-12:20	A Matrix-Based Heuristic Algorithm for Extracting Multiword Expressions from a Corpus (long, online)
	Orhan Bilgin
	[paper][slides]
12:20-12:40	Multi-word Lexical Units Recognition in WordNet (long, online)
	Marek Maziarz, Ewa Rudnicka and Łukasz Grabowski
	[paper][slides]
12:40-13:00	Automatic Detection of Difficulty of French Medical Sequences in Context (long, online)
	Anaïs Koptient and Natalia Grabar
	[paper][slides]

13:00- 14:00	Lunch break

14:00- 15:00	Session 3: Joint SIGUL-MWE poster session
	Chair: Shiva Taslimipoor - Phar’Club area
	[MWE] Annotating “Particles” in Multiword Expressions in te reo Māori for a Part-of-Speech Tagger (long)
	Aoife Finn, Suzanne Duncan, Peter-Lucas Jones, Gianna Leoni and Keoni Mahelona
	[paper][poster]
	[MWE] Metaphor Detection for Low Resource Languages: From Zero-Shot to Few-Shot Learning in Middle High German (short)
	Felix Schneider, Sven Sickert, Phillip Brandes, Sophie Marshall and Joachim Denzler
	[paper][poster]
	[MWE] Automatic Bilingual Phrase Dictionary Construction from GIZA++ Output (long) RETRACTED
	~~Albina Khusainova, Vitaly Romanov and Adil Khan~~
	[MWE] A BERT’s Eye View: Identification of Irish Multiword Expressions Using Pre-trained Language Models (long)
	Abigail Walsh, Teresa Lynn and Jennifer Foster
	[paper][poster]
	[MWE] Enhancing the PARSEME Turkish Corpus of Verbal Multiword Expressions (short)
	Yagmur Ozturk, Najet Hadj Mohamed, Adam Lion-Bouton and Agata Savary
	[paper][poster]
	[MWE] German Light Verb Constructions in Business Process Models (non-archival)
	Kristin Kutzner and Ralf Laue
	[paper][poster] - published at LREC 2022 main
	[SIGUL] Evaluating Unsupervised Approaches to Morphological Segmentation for Wolastoqey
	Diego Bear and Paul Cook
	[SIGUL] Baseline English and Maltese-English Classification Models for Subjectivity Detection, Sentiment Analysis, Emotion Analysis, Sarcasm Detection, and Irony Detection
	Keith Cortis and Brian Davis
	[SIGUL] Building Open-source Speech Technology for Low-resource Minority Languages with SáMi as an Example - Tools, Methods and Experiments
	Katri Hiovain-Asikainen and Sjur Moshagen
	[SIGUL] Investigating the Quality of Static Anchor Embeddings from Transformers for Under-Resourced Languages
	Pranaydeep Singh, Orphee De Clercq and Els Lefever
	[SIGUL] Introducing YakuToolkit. Yakut Treebank and Morphological Analyzer
	Tatiana Merzhevich and Fabrício Ferraz Gerardi
	[SIGUL] A Language Model for Spell Checking of Educational Texts in Kurdish (Sorani)
	Roshna Abdulrahman and Hossein Hassani
	[SIGUL] SimRelUz: Similarity and Relatedness Scores as a Semantic Evaluation Dataset for Uzbek Language
	Ulugbek Salaev, Elmurod Kuriyozov and Carlos Gómez-Rodríguez
	[SIGUL] ENRICH4ALL: A First Luxembourgish BERT Model for a Multilingual Chatbot
	Dimitra Anastasiou

	Joint SIGUL-MWE keynote: Steven Bird
15:00- 16:00	Multiword Expressions and the Low-Resource Scenario from the Perspective of a Local Oral Culture
	Chair: Shiva Taslimipoor, Online co-chair: Paul Cook
	Grand Large room
	[abstract][slides]

16:00 - 16:30	Coffee break

	Session 4: Oral presentations
	Chair: Teresa Lynn, Online co-chair: Paul Cook
16:30-16:40	Compound-internal Anaphora: Evidence from Acceptability Judgements on Italian Argumental Compounds (non-archival, online)
	Irene Lami and Joost van de Weijer
	[slides]
16:40-16:50	Light Verb Constructions in Corpora of Historical English (non-archival, online)
	Eva Zehentner
16:50-17:05	Sample Efficient Approaches for Idiomaticity Detection (short, online)
	Dylan Robert Schumacher Phelps, Xuan-Rui Fan, Edward Gow-Smith, Harish Tayyar Madabushi, Carolina Scarton and Aline Villavicencio
	[paper][slides]
17:05-17:20	mwetoolkit-lib: Adaptation of the mwetoolkit as a Python Library and an Application to MWE-based Document Clustering (short, online)
	Fernando Rezende Zagatti, Paulo Augusto de Lima Medeiros, Esther da Cunha Soares, Lucas Nildaimon dos Santos Silva, Carlos Ramisch and Livy Real
	[paper][slides]
17:20-17:40	Handling Idioms in Symbolic Multilingual Natural Language Generation (long, online)
	Michaelle Dubé and François Lareau
	[paper][slides]

17:40- 18:00	MWE community discussion
	Chair: Carlos Ramisch
	Open to all MWE Section members for online participation
	[slides]

Keynote speakers

This year, we are going to have two amazing talks by:

Sabine Schulte im Walde, University of Stuttgart

Title: Figurative Language in Noun Compound Models across Target Properties, Domains and Time
Abstract: A variety of distributional and multi-modal computational approaches has been suggested for modelling the degrees of compositionality across types of multiword expressions and languages. As the starting point of my talk, I will present standard variants of computational models that have been proven successful in predicting the compositionality of German and English noun compounds. The main part of the talk will then be concerned with investigating the general reliability of these standard models and discussing implications for gold-standard datasets: I will demonstrate how prediction results vary (i) across representations, (ii) across empirical target properties, (iii) across compound types, (iv) across levels of abstractness, and (v) for general- vs. domain-specific language. Finally, I will present a preliminary quantitative study on diachronic changes of noun compound meanings and compositionality over time.
Bio: Sabine Schulte im Walde is an Associate Professor at the Institute for Natural Language Processing at the University of Stuttgart. She studied Computational Linguistics and Cognitive Science at the Universities of Stuttgart and Edinburgh, received a PhD in Computational Linguistics in 2003 from the University of Stuttgart, and the Venia Legendi (Habilitation) from Saarland University in 2009. From 2003-2004 she was a member of the Language Technology Group at the lexicographer DUDEN in Mannheim. In the past 10 years, Sabine has been the Principal Investigator of several research projects from the German Research Foundation (DFG), she was a Director of the Integrated Research Training Group for doctoral students in the DFG Collaborative Research Centre 732, and from 2011-2016 she was a DFG Heisenberg Fellow. The topics of her research include synchronic and diachronic language variation and ambiguity; figurative languge usage in multiword expressions and metaphors; the creation of datasets with human judgements on meaning components and meaning relatedness; and the application of models to lexicography, machine translation, and terminology extraction.

Steven Bird, Charles Darwin University

Title: Multiword Expressions and the Low-Resource Scenario from the Perspective of a Local Oral Culture
Abstract: Research on multiword expressions and on under-resourced languages often begins with problematisation. The existence of non-compositional meaning, or the paucity of conventional language resources, are treated as problems to be solved. This perspective is associated with the view of Language as a lexico-grammatical code, and of NLP as a conventional sequence of computational tasks. In this talk, I share from my experience in an Australian Aboriginal community, where people tend to see language as an expression of identity and of ‘connection to country’. Here, my early attempts to collect language data were thwarted. There was no obvious role for tasks like speech recognition, parsing, or translation. Instead, working under the authority of local elders, I pivoted to language processing tasks that were more in keeping with local interests and aspirations. I describe these tasks and suggest some new ways of framing the work of NLP, and I explore implications for work on multiword expressions and on under-resourced languages.
Bio: Steven Bird is conducting social and technological experiments in the future evolution of the world's languages. Together with his students and colleagues, he is developing scalable methods for preserving disappearing words and worldviews for future generations of speakers and scholars. He is collaborating with speech communities in diasporas and ancestral homelands to design new approaches to language maintenance and revitalisation. Steven studied computer science at the University of Melbourne before completing a PhD in computational linguistics at the University of Edinburgh. He has conducted fieldwork on endangered languages in West Africa, South America, Central Asia, Melanesia, and Australia. He has held academic positions at the Universities of Edinburgh, Pennsylvania, Melbourne, and UC Berkeley. He holds a secondary appointment as Senior Research Scientist at the International Computer Science Institute, UC Berkeley. He serves as Linguist at the Nawarddeken Academy in West Arnhem. Steven is leading the Top End Language Lab.
Note: This talk is scheduled for the MWE+SIGUL Joint Session.

Multiword expressions (MWEs) are word combinations which exhibit lexical, syntactic, semantic, pragmatic and/or statistical idiosyncrasies (Baldwin & Kim 2010), such as by and large, hot dog, pay a visit and pull one’s leg. The notion encompasses closely related phenomena: idioms, compounds, light-verb constructions, phrasal verbs, rhetorical figures, collocations, institutionalised phrases, etc. Their behaviour is often unpredictable; for example, their meaning often does not result from the direct combination of the meanings of their parts. Given their irregular nature, MWEs often pose complex problems in linguistic modelling (e.g. annotation), NLP tasks (e.g. parsing), and end-user applications (e.g. natural language understanding and MT), hence still representing an open issue for computational linguistics (Constant et al. 2017).

For almost two decades, modelling and processing MWEs for NLP has been the topic of the MWE workshop organised by the MWE section of SIGLEX in conjunction with major NLP conferences since 2003. Impressive progress has been made in the field, but our understanding of MWEs still requires much research considering its need and usefulness in NLP applications. For this 18th edition of the workshop, we identified three topics on which contributions are particularly encouraged:

MWE processing in low-resource languages: The PARSEME shared tasks (Ramisch et al. 2020; Ramisch et al. 2018; Savary et al. 2017), among others, have fostered significant progress in MWE identification, providing datasets that include low-resource languages, evaluation measures and tools that now allow fully integrating MWE identification into end-user applications. A few efforts have recently explored methods for automatic interpretation of MWEs (Bhatia et al. 2018; Bhatia et al. 2017). Pursuing similar efforts on understanding MWEs in low-resource languages is beneficial. there are some recent efforts on processing of MWEs in low-resource languages (Liu & Wang 2020; Kumar et al. 2017; Wei et al. 2015). Resource creation and sharing should be pursued in parallel to the development of methods able to capitalize on small datasets.
MWE identification and interpretation in pre-trained language models: Most current MWE processing is limited to their identification and detection using pre-trained language models (Taslimipoor et al. 2020), but we lack understanding about how MWEs are represented and dealt with therein (Nedumpozhimana & Kelleher 2021; Garcia et al. 2021, Fakharian & Cook 2021). Now that NLP has shifted towards end-to-end neural models like BERT, capable of solving complex end-user tasks with little or no intermediary linguistic symbols, questions arise about the extent to which MWEs should be implicitly or explicitly modelled in such models (Shwartz and Dagan 2019).
MWE processing to enhance end-user applications: As underlined by the MWE 2021 call for papers, MWEs gained particular attention in end-user applications, including MT (Zaninello & Birch 2020), simplification (Kochmar et al. 2020, Liu & Hwa 2016), language learning and assessment (Paquot et al. 2019, Christiansen & Arnon 2017), social media mining (Maisto et al. 2017), and abusive language detection (Zampieri et al. 2020, Caselli et al. 2020). We believe that it is crucial to extend and deepen these first attempts to integrate and evaluate MWE technology in these and further end-user applications.

Through this workshop, we would like to bring together and encourage researchers in various NLP subfields to submit MWE-related research, so that approaches that deal with processing of MWEs including processing for low-resource languages and for various applications can benefit from each other. We also intend to consolidate the converging effects of previous joint workshops LAW-MWE-CxG 2018, MWE-WN 2019 and MWE-LEX 2020, and the joint MWE-WOAH panel in 2021, extending our scope to MWEs in e-lexicons and WordNets, MWE annotation, as well as grammatical constructions. Correspondingly, we will call for papers on research related (but not limited) to MWEs and constructions in:

Computationally-applicable theoretical work in psycholinguistics and corpus linguistics
Annotation and representation in resources such as corpora, treebanks, e-lexicons, and WordNets
Processing in syntactic and semantic frameworks (e.g. CCG, CxG, HPSG, LFG, TAG, UD, etc.)
Discovery and identification methods
Interpretation of MWEs and understanding of text containing them
Language acquisition, language learning, and non-standard language (e.g. tweets, speech)
Evaluation of annotation and processing techniques
Retrospective comparative analyses from the PARSEME shared tasks
Processing for end-user applications (e.g. MT, NLU, summarisation, language learning, etc.)
Implicit and explicit representation in pre-trained language models and end-user applications
Evaluation and probing of pre-trained language models and end-user applications
Resources and tools (e.g. lexicons, identifiers) and their integration into end-user applications
Theoretical and computational linguistic description and modelling in low-resource languages
Annotation guidelines and methods in low-resource languages (expert, crowdsourcing, automatic)
Adaptation and transfer of annotations and related resources to low-resource languages
Processing in low-resource languages (supervised, semi-supervised, and unsupervised methods for identification, discovery, and interpretation)
Evaluation of annotations and processing techniques for low-resource languages
Processing for end-user applications in low-resource languages

Joint session with SIGUL 2022 Workshop

Pursuing its efforts in building bridges with other communities, the MWE Section organises a joint session with the workshop of the Special Interest Group on Under-resourced Languages (SIGUL 2022). The goal is to foster future synergies that could address scientific challenges in the creation of resources, models and applications to deal with multiword expressions and related phenomena in low-resource scenarios, in accordance with one of our special topics in MWE 2022. The session will feature a joint poster session and a joint keynote talk by Steven Bird.

Submission modalities

The workshop invites two types of submissions:

Archival submissions present substantially original research. Submissions will follow the LREC stylesheet. They can be long papers (8 content pages + references) or short papers (4 content pages + references). The decisions as to oral or poster presentations will be taken by the PC chairs, with no distinction in the proceedings. Submission will be double-blind.
Non-archival submissions of abstracts will also be considered for presentation, but not included in the proceedings. They can be abstracts describing preliminary results, work in progress, or ongoing projects or abstracts of papers recently submitted or published at other venues (conferences, journals, book chapters). Abstracts are not anonymous and will go through a light reviewing process.

All papers should be submitted via the workshop’s START submission space. Please choose the appropriate link for standard Archival submission or for the Non-archival submission. Registering to the workshop will be necessary to present both archival and non-archival submissions. Presentation and participation formats (on-line, on-site, both) will depend on LREC 2022 main conference arrangements and will be announced later.

Instructions for authors

The double-blind submissions (Archival submissions) should adhere to the ACL Author Guidelines. There is no limit on the number of reference pages.

The PMWE book series editors have put forward a list of conventions to cite multilingual MWE examples and a checklist for PMWE authors. Parts of the checklist are specific to PMWE authors, but sections like Terms, abbreviations and spelling can be relevant for MWE 2022 submissions. We encourage authors to adopt these conventions whenever relevant, without enforcing them. We hope that, in the long term, these could become widely adopted standards in the community.

All submissions should be made via the workshop’s START space. Please choose the appropriate submission modality as described in the Sumbission modalities section above.

Important dates

Paper Submission Deadline: April 17, 2022
Notification of Acceptance: May 3, 2022
Camera-ready Papers Deadline: May 23, 2022
Workshop: June 25, 2022

Organizers

Program chairs: Archna Bhatia, Paul Cook, and Shiva Taslimipoor
Publication chairs: Marcos Garcia
Communication chair: Carlos Ramisch

Program committee

See the full list

Tim Baldwin, University of Melbourne (Australia)
Verginica Barbu Mititelu, Romanian Academy (Romania)
Francis Bond, Palacký University (Czech Republic)
Claire Bonial, U.S. Army Research Laboratory (USA)
Tiberiu Boroș, Adobe (Romania)
Marie Candito, Université Paris Cité (France)
Anastasia Christofidou, Academy of Athens (Greece)
Ken Church, Baidu (USA)
Matthieu Constant, Université de Lorraine (France)
Monika Czerepowicka, University of Warmia and Mazury (Poland)
Myriam de Lhonneux, University of Copenhagen (Denmark)
Gaël Dias, University of Caen Basse-Normandie (France)
Gülşen Eryiğit, Istanbul Technical University (Turkey)
Meghdad Farahmand, University of Geneva (Switzerland)
Christiane Fellbaum, Princeton University (USA)
Joaquim Ferreira da Silva, New University of Lisbon (Portugal)
Aggeliki Fotopoulou, Institute for Language and Speech Processing/RC “Athena” (Greece)
Voula Giouli, Institute for Language and Speech Processing (Greece)
Stefan Th. Gries, UC Santa Barbara (USA) & JLU Giessen (Germany)
Uxoa Iñurrieta, University of the Basque Country (Spain)
Diptesh Kanojia, Surrey Institute for People-Centred AI, University of Surrey (UK)
Ioannis Korkontzelos, Edge Hill University (UK)
Cvetana Krstev, University of Belgrade (Serbia)
Eric Laporte, Gustave Eiffel University (France)
Timm Lichte, University of Tübingen (Germany)
Irina Lobzhanidze, Ilia State University (Georgia)
Teresa Lynn, ADAPT Centre (Ireland)
Gunn Inger Lyse Samdal, University of Bergen (Norway)
Stella Markantonatou, Institute for Language and Speech Processing (Greece)
Yuji Matsumoto, RIKEN Center for Advanced Intelligence Project (Japan)
Jan Odijk, University of Utrecht (Netherlands)
Haris Papageorgiou, Institute for Language and Speech Processing (Greece)
Yannick Parmentier, Université d’Orléans (France)
Pavel Pecina, Charles University (Czech Republic)
Ted Pedersen, University of Minnesota (USA)
Scott Piao, Lancaster University (UK)
Alain Polguère, Université de Lorraine (France)
Livy Real, americanas s.a. (Brazil)
Fatiha Sadat, Université du Québec à Montréal (Canada)
Magali Sanches Duran, University of São Paulo (Brazil)
Sabine Schulte im Walde, University of Stuttgart (Germany)
Matthew Shardlow, Manchester Metropolitan University (UK)
Ivelina Stoyanova, Bulgarian Academy of Sciences (Bulgaria)
Pavel Straňák, Charles University (Czech Republic)
Stan Szpakowicz, University of Ottawa (Canada)
Carole Tiberius, Dutch Language Institute (Netherlands)
Beata Trawinski, Leibniz Institute for the German Language (Germany)
Zdeňka Urešová, Charles University (Czech Republic)
Ruben Urizar, University of the Basque Country (Spain)
Lonneke van der Plas, University of Malta (Malta)
Veronika Vincze, Hungarian Academy of Sciences (Hungary)
Martin Volk, University of Zürich (Switzerland)
Zeerak Talat, Digital Democracies Institute, Simon Fraser University (Canada)
Marion Weller-Di Marco, Ludwig Maximilian University of Munich (Germany)
Jelena Mitrović, University of Passau (Germany)
Petya Osenova , Bulgarian Academy of Sciences (Bulgaria)
Ashwini Vaidya, Indian Institute of Technology, Delhi (India)

Contact

For any inquiries regarding the workshop please send an email to mweworkshop2022@gmail.com

Please register to SIGLEX and check the “MWE Section” box to be registered to our mailing list.

Anti-harassment policy

The workshop supports the ACL anti-harassment policy.