Colocated with: EACL 2023 (Dubrovnik, Croatia)
Date of the Workshop: 6 May 2023
Organised and sponsored by:
Special Interest Group on the Lexicon (SIGLEX) of the Association for Computational Linguistics (ACL)
The proceedings for MWE2023 can be found here.
Video Recording: TBD
Bio: Leo Wanner is ICREA Research Professor at the Pompeu Fabra University in Barcelona, with 230+ peer reviewed publications and 10 edited volumes. He is Associate Editor of the Computational Intelligence and Frontiers in AI, Language and Computation journals and serves as regular reviewer for a number of high-profile conferences and journals on Computational Linguistics. Throughout his career, Leo worked on a number of topics in the field, including natural language generation and summarization, concept extraction, conversational agents, hate speech recognition, and, in particular, also lexical collocation identification and classification.
Title: Lexical collocations: Explored a lot, still a lot more to explore
Abstract: Lexical collocations, i.e., idiosyncratic binary lexical item combinations, have been an active research topic already for a number of years. State-of-the-art neural network models report to detect and classify specific types of lexical collocations with high accuracy, which might suggest that the problem has been solved. However, a cross-type and cross-language analysis of the results of one of these models raises several relevant research questions. In the first part of my talk, I will present our recent work on the identification and classification of lexical collocations with respect to the fine-grained taxonomy of lexical functions (LFs) in English, French, Spanish and Japanese. Drawing on the outcome of this work, I will focus, in the second part of my talk, on the comparative analysis of the “LF profiles” of English and Japanese material. In particular, I will discuss (i) how the considered LFs are distributed in the given corpora; (ii) how rich the repertoires of the LF instances are in each of them; (iii) whether the contexts of the LF instances overlap; and (iv) to what extent the “profile” of an LF correlates with the accuracy of the recognition of its instances. To conclude, I will formulate the research questions that arise from this analysis.
Bios: Asma Ben Abacha is a Senior Scientist at Microsoft, with over 80 peer reviewed publications. Her research interests include Natural Language Processing, Machine Learning, Artificial Intelligence and their applications in medicine and healthcare.
Goran Nenadic is a Professor in the Department of Computer Science at University of Manchester and a Turing Fellow at the Alan Turing Institute, with more than 250 peer reviewed publications. His research interests include Natural Language Processing, text mining, and health informatics.
Title: MWEs in ClinicalNLP and Healthcare Text Analytics
Multiword expressions (MWEs) are word combinations that exhibit lexical, syntactic, semantic, pragmatic, and/or statistical idiosyncrasies (Baldwin & Kim 2010), such as by and large, hot dog, pay a visit and pull one’s leg. The notion encompasses closely related phenomena: idioms, compounds, light-verb constructions, phrasal verbs, rhetorical figures, collocations, institutionalised phrases, etc. Their behaviour is often unpredictable; for example, their meaning often does not result from the direct combination of the meanings of their parts. Given their irregular nature, MWEs often pose complex problems in linguistic modelling (e.g. annotation), NLP tasks (e.g. parsing), and end-user applications (e.g. natural language understanding and MT), hence still representing an open issue for computational linguistics (Constant et al. 2017).
For almost two decades, modelling and processing MWEs for NLP has been the topic of the MWE workshop organised by the MWE section of SIGLEX in conjunction with major NLP conferences since 2003. Impressive progress has been made in the field, but our understanding of MWEs still requires much research considering their need and usefulness in NLP applications. This is also relevant to domain-specific NLP pipelines that need to tackle terminologies most often realised as MWEs. Following previous years, for this 19th edition of the workshop, we identified the following topics on which contributions are particularly encouraged:
Through this workshop, we would like to bring together and encourage researchers in various NLP subfields to submit MWE-related research, so that approaches that deal with processing of MWEs including processing for low-resource languages and for various applications can benefit from each other. We also intend to consolidate the converging effects of previous joint workshops LAW-MWE-CxG 2018, MWE-WN 2019 and MWE-LEX 2020, the joint MWE-WOAH panel in 2021, and the MWE-SIGUL 2022 joint session, extending our scope to MWEs in e-lexicons and WordNets, MWE annotation, as well as grammatical constructions. Correspondingly, we call for papers on research related (but not limited) to MWEs and constructions in:
We do not have a shared task this year, but a new release of the PARSEME corpus of verbal MWEs is currently underway. We encourage submission of research papers that include analyses of the new edition of the PARSEME data and improvements over the results for PARSEME 2020 shared task as well as SemEval 2022 task 2 on idiomaticity prediction.
Pursuing the MWE Section’s tradition of synergies with other communities, this year, we are organizing a joint session with the Clinical NLP workshop for shared papers/poster presentations. Since clinical texts contain an important amount of multiword expressions (e.g. medical terms or domain-specific collocations), a joint session is deemed beneficial for both communities. The goal is to foster future synergies that could address scientific challenges in the creation of resources, models and applications to deal with multiword expressions and related phenomena in the specialised domain of ClinicalNLP. Submissions describing research on MWEs in the specialized domain of ClinicalNLP, especially introducing new datasets or new tools and resources, are welcome. Papers accepted in this track will have the option to present their work in the Clinical NLP workshop at ACL 2023 as well, after being presented at MWE 2023.
All full papers in the workshop will be considered by the program committee for a best paper award.
The workshop invites two types of submissions:
Papers should be submitted via the workshop’s START submission page. Please choose the appropriate submission format (archival/non-archival). Archival papers with existing reviews will also be accepted through the ACL Rolling Review. Submissions must follow the ACL 2023 stylesheet.
Archival papers with existing reviews from ACL Rolling Review will also be considered. A paper may not be simultaneously under review through ARR and MWE. A paper that has or will receive reviews through ARR may not be submitted for review to MWE.
To commit an ARR submission, email its OpenReview forum URL (https://openreview.net/forum?id=XXXXXXXXXXX) to the organizers.
All deadlines are at 23:59 UTC-12 (Anywhere on Earth).
The workshop follows the ACL anti-harassment
For any inquiries regarding the workshop please send an email to the Organizing
Committee at email@example.com
Please register to SIGLEX and check the “MWE
Section” box to be registered to our mailing list.