Congreso, coloquio o simposio

Infoling 1.29 (2023)
Título:19th Workshop on Multiword Expressions (MWE 2023)
Entidad organizadora:MWE SIGLEX (Special Interest Group on the Lexicon of the Association for Computational Linguistics)
Lugar de celebración:Dubrovnik, Croacia
Fecha de inicio:2 de mayo de 2023
Fecha de finalización:6 de mayo de 2023
Circular Nº:1

Multiword expressions (MWEs) are word combinations that exhibit lexical, syntactic, semantic, pragmatic, and/or statistical idiosyncrasies (Baldwin & Kim 2010), such as by and large, hot dog, pay a visit and pull one's leg. The notion encompasses closely related phenomena: idioms, compounds, light-verb constructions, phrasal verbs, rhetorical figures, collocations, institutionalised phrases, etc. Their behaviour is often unpredictable; for example, their meaning often does not result from the direct combination of the meanings of their parts. Given their irregular nature, MWEs often pose complex problems in linguistic modelling (e.g. annotation), NLP tasks (e.g. parsing), and end-user applications (e.g. natural language understanding and MT), hence still representing an open issue for computational linguistics (Constant et al. 2017).


For almost two decades, modelling and processing MWEs for NLP has been the topic of the MWE workshop organised by the MWE section of SIGLEX in conjunction with major NLP conferences since 2003. Impressive progress has been made in the field, but our understanding of MWEs still requires much research considering their need and usefulness in NLP applications. This is also relevant to domain-specific NLP pipelines that need to tackle terminologies most often realised as MWEs.


Following previous years, for this 19th edition of the workshop, we identified the following topics on which contributions are particularly encouraged:


- MWE processing and identification in specialized languages and domains: Multiword terminology extraction from domain-specific corpora (Bonin et al. 2010) is of particular importance to various applications, such as MT (Semmar & Laib, 2017), or for the identification and monitoring of neologisms and technical jargon (Chatzitheodorou et al, 2021).  We expect approaches that deal with the processing of MWEs as well as the processing of terminology in specialised domains can benefit from each other. 


- MWE processing to enhance end-user applications: MWEs have gained particular attention in end-user applications, including MT (Zaninello & Birch 2020; Han et al. 2021), simplification (Kochmar et al. 2020), language learning and assessment (Paquot et al. 2019; Christiansen & Arnon 2017), social media mining (Maisto et al. 2017), and abusive language detection (Zampieri et al. 2020; Caselli et al. 2020). We believe that it is crucial to extend and deepen these first attempts to integrate and evaluate MWE technology in these and further end-user applications.


- MWE identification and interpretation in pre-trained language models: Most current MWE processing is limited to their identification and detection using pre-trained language models, but we still lack understanding about how MWEs are represented and dealt with therein (Nedumpozhimana & Kelleher 2021; Garcia et al. 2021, Fakharian & Cook 2021), how to better model the compositionality of MWEs from semantics (Moreau et al. 2018). Now that NLP has shifted towards end-to-end neural models like BERT, capable of solving complex tasks with little or no intermediary linguistic symbols, questions arise about the extent to which MWEs should be implicitly or explicitly modelled (Shwartz & Dagan, 2019).


- MWE processing in low-resource languages: The PARSEME shared tasks (Ramisch et al. 2020; 2018; Savary et al. 2017), among others, have fostered significant progress in MWE identification, providing datasets that include low-resource languages, evaluation measures, and tools that now allow fully integrating MWE identification into end-user applications. A few efforts have recently explored methods for the automatic interpretation of MWEs (Bhatia, et al. 2018; 2017), and their processing in low-resource languages (Liu & Wang 2020; Kumar et al. 2017). Resource creation and sharing should be pursued in parallel with the development of methods able to capitalize on small datasets (Han et al. 2020).


Through this workshop, we would like to bring together and encourage researchers in various NLP subfields to submit MWE-related research, so that approaches that deal with processing of MWEs including processing for low-resource languages and for various applications can benefit from each other. We also intend to consolidate the converging effects of previous joint workshops LAW-MWE-CxG 2018, MWE-WN 2019 and MWE-LEX 2020, the joint MWE-WOAH panel in 2021, and the MWE-SIGUL 2022 joint session, extending our scope to MWEs in e-lexicons and WordNets, MWE annotation, as well as grammatical constructions. Correspondingly, we call for papers on research related (but not limited) to MWEs and constructions in:

- Computationally-applicable theoretical work in psycholinguistics and corpus linguistics; 

- Annotation (expert, crowdsourcing, automatic) and representation in resources such as corpora, treebanks, e-lexicons, and WordNets (also for low-resource languages); 

- Processing in syntactic and semantic frameworks (e.g. CCG, CxG, HPSG, LFG, TAG, UD, etc.); 

- Discovery and identification methods, including for specialized languages and domains such as clinical or biomedical NLP; 

- Interpretation of MWEs and understanding of text containing them; 

- Language acquisition, language learning, and non-standard language (e.g. tweets, speech); 

- Evaluation of annotation and processing techniques; 

- Retrospective comparative analyses from the PARSEME shared tasks;

- Processing for end-user applications (e.g. MT, NLU, summarisation, language learning, etc.);

- Implicit and explicit representation in pre-trained language models and end-user applications;

- Evaluation and probing of pre-trained language models; 

- Resources and tools (e.g. lexicons, identifiers) and their integration into end-user applications;

- Multiword terminology extraction;

- Adaptation and transfer of annotations and related resources to new languages and domains including low-resource ones.


Shared Task:

We do not have a shared task this year, but a new release of the PARSEME corpus of verbal MWEs is currently underway. We encourage submission of research papers that include analyses of the new edition of the PARSEME data and improvements over the results for PARSEME 2020 shared task as well as SemEval 2022 task 2 on idiomaticity prediction.


Submission formats: 

The workshop invites  two types of submissions:

- archival submissions that present substantially original research in both long paper format (8 pages + references) and short paper format (4 pages + references).

- non-archival submissions of abstracts describing relevant research presented/published elsewhere which will not be included in the MWE proceedings.


Paper submission and templates:

Papers should be submitted via the workshop's START submission page (TBD). Please choose the appropriate submission format (archival/non-archival). Archival papers with existing reviews will also be accepted through the ACL Rolling Review. Submissions must follow the ACL 2023 stylesheet.


Important Dates:

Camera-ready papers due: March 27, 2023

Workshop: May 2 or 6, 2023

Área temática:Lexicografía, Lexicología, Lingüística computacional, Lingüística de corpus
Comité científico

Margarita Alonso-Ramos, Universidade da Coruña

Verginica Barbu Mititelu, Romanian Academy

Claire Bonial, U.S. Army Research Laboratory

Tiberiu Boroș, Adobe

Miriam Butt , University of Konstanz

Marie Candito, Paris Diderot University

Anastasia Christofidou, Academy of Athens

Ken Church, IBM Research

Monika Czerepowicka, University of Warmia and Mazury

Gaël Dias, University of Caen Basse-Normandie

Rafael Ehren, Heinrich Heine University Düsseldorf

Ismail El Maarouf, Adarga Ltd

Meghdad Farahmand, University of Geneva

Joaquim Ferreira da Silva, New University of Lisbon

Aggeliki Fotopoulou, ATHENA RC

Stefan Th. Gries, University of California

Chikara Hashimoto, Yahoo! Japan

Laura Kallmeyer, Heinrich Heine University Düsseldorf

Elma Kerz, RWTH Aachen

Ioannis Korkontzelos, Edge Hill University

Cvetana Krstev, University of Belgrade

Eric Laporte, University Paris-Est Marne-la-Vallee

Timm Lichte, University of Tübingen

Irina Lobzhanidze, Ilia State University

Teresa Lynn, ADAPT Centre

Stella Markantonatou, ATHENA RC

Yuji Matsumoto, Nara Institute of Science and Technology

Johanna Monti, University of Naples L’Orientale

Joakim Nivre, Uppsala University

Jan Odijk, University of Utrecht

Yannick Parmentier, University of Lorraine

Agnieszka Patejuk, University of Oxford and Polish Academy of Sciences

Pavel Pecina, Charles University

Ted Pedersen, University of Minnesota

Miriam R.L Petruck, University of Berkeley

Scott Piao, Lancaster University

Alain Polguère, Université de Lorraine

Alexandre Rademaker, IBM Research Brazil and EMAp/FGV

Agata Savary, Université Paris-Saclay

Sabine Schulte im Walde, University of Stuttgart

Matthew Shardlow, Manchester Metropolitan University

Ivelina Stoyanova, Bulgarian Academy of Sciences

Beata Trawinski, Institut für Deutsche Sprache Mannheim

Marion Weller-Di Marco, LMU Munich

Petya Osenova, Bulgarian Academy of Sciences

Prisca Piccirilli, University of Stuttgart

Carlos Ramisch, Aix Marseille University

Yagmur Ozturk, Université de Franche-Comté

Comité organizador

Marcos Garcia, Universidade de Santiago de Compostela

Voula Giouli, Institute for Language and Speech Processing (Athena)

Shiva Taslimipoor, University of Cambridge

Lifeng Han, University of Manchester

Archna Bhatia, Florida Institute for Human & Machine Cognition

Kilian Evang, Universität Düsseldorf

Plazo de envío de propuestas: hasta el6 de marzo de 2023
Lengua(s) oficial(es) del evento:


Fecha de publicación en Infoling:19 de enero de 2023
Marcos Garcia
Universidade de Santiago de Compostela