skip to content


Faculty of Modern and Medieval Languages and Linguistics


Dr Marieke Meelen

Assistant Professor in Historical Linguistics at the MMLL Faculty
Fellow, Tutor & Director of Studies at Trinity Hall
Theoretical and Applied Linguistics
Faculty of Modern & Medieval Languages & Linguistics
Contact details: 
Telephone number: 
44 (0)1223 (7)60820

Trinity Hall
Trinity Lane


Marieke Meelen’s research interests include information structure, comparative syntax and historical linguistics. She is currently part of two AHRC-funded projects: the Emergence of Egophoricity (with Prof Hill at SOAS, University of London) and The History of Subject Pronouns (with Prof Willis at Oxford University and Prof Meier in Berlin). She is also the PI of an ELDP-funded research projects documenting endangered languages in Nepal.
She is interested in NLP and corpus creation for low-resource languages.

As part of her British Academy postdoctoral fellowship, she worked on the history of V2 word orders across Indo-European languages and developing a historical treebank of Welsh. Her doctoral thesis combined methods from computational and historical linguistics to reconstruct verb-initial and verb-second word order patterns and information structure in Welsh in their Celtic historical context. She is also a computational linguistic consultant for a project on the annotation of Middle Welsh texts at the Philipps-Universität in Marburg.

Marieke was awarded her PhD at Leiden University in 2016 supervised by Prof Lisa Cheng and Prof Alexander Lubotsky. 

Teaching interests: 

Historical Linguistics, NLP for low-resource languages

Research interests: 
  • Historical Linguistics (Syntax & Reconstruction)
  • Grammaticalisation & Pragmaticalisation
  • NLP for low-resource languages
  • Celtic & Tibeto-Burman languages
Recent research projects: 
  • AHRC-DFG ‘History of Subject Pronouns in Northern Europe’ (2021-2024)
  • AHRC ‘Emergence of Egophoricity’ (2022-2026)
  • ELDP SG ‘An Audio-Visual Archive of South Mustang Tibetan’
Published works: 

Faggionato, C., Hill, N., & Meelen, M. (2022). NLP Pipeline for Annotating (Endangered) Tibetan and Newar Varieties. LREC-EURALI Proceedings, pp. 1-6. 

Darling, M., Meelen, M., & Willis, D. (2022). Towards coreference resolution for Early Irish. LREC-CLTW  Proceedings, pp. 85-93,

Meelen, M., Roux, É., & Hill, N. (2021). Optimisation of the Largest Annotated Tibetan Corpus Combining Rule-based, Memory-based, and Deep-learning Methods. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 20(1), 1-11.

Barnett, R., Faggionato, C., Meelen, M., Yunshaab, S., Samdrup, T., Hill, N., & Diemberger, H. (2021). NER for Tibetan and Mongolian Newspapers.

Meelen, M. (2020). Reconstructing the rise of V2 in Welsh. Woods—Wolfe, 2020, 426-454.

Faggionato, C., & Meelen, M. (2019). Developing the old Tibetan treebank. in Proceedings of the RANLP.

Hill, N. W., & Meelen, M. (2017). Segmenting and POS tagging Classical Tibetan using a memory-based tagger. Himalayan Linguistics, 16(2), 64-86.

Meelen, Marieke (to appear) ‘Annotating Middle Welsh: POS tagging and chunk-parsing a partial corpus of native prose’ in Proceedings of the Maynooth Colloquium on Celtic Computational Linguistics.
Meelen, Marieke & Nurmio, Silva (to appear) 'Adjectival agreement in Middle Welsh translated prose' in Journal of Celtic Linguistics.
Meelen, Marieke, Mourigh, Khalid & Cheng, Lisa (to appear) 'V3 word order in Dutch urban varieties’, in Clausal Architecture and Its Consequences: Synchronic and Diachronic Perspectives, András Bárány, Theresa Biberauer, Jamie Douglas and Sten Vikner (eds.)
Meelen, Marieke (to appear) 'The emergence of V2 word order in Welsh' in Biberauer, Woods & Wolfe (eds.) Rethinking V2, Oxford: Oxford University Press.
Meelen, Marieke & Hill, Nathan (2017) 'Segmenting and POS tagging Classical Tibetan', Himalayan Linguistics 16 (2), 64-89.
Meelen, Marieke, Hill, Nathan, & Handy, Christopher. (2017a) The Annotated Corpus of Classical Tibetan (ACTib), Part I - Segmented version, based on the BDRC digitised text collection, tagged with the Memory-Based Tagger from TiMBL [Data set]. Zenodo.
Meelen, Marieke, Hill, Nathan, & Handy, Christopher. (2017b) The Annotated Corpus of Classical Tibetan (ACTib), Part II - POS-tagged version, based on the BDRC digitised text collection, tagged with the Memory-Based Tagger from TiMBL [Data set]. Zenodo.        
Meelen, Marieke (2017) 'Object-initial word order in Middle Welsh narrative prose' in Widmer & Poppe (eds.) Referential Properties and Their Impact on the Syntax of Insular Celtic Languages. pp. 145-178.
Meelen, Marieke (2016) Why Jesus and Job spoke bad Welsh: the origin and distribution of V2 orders in Middle Welsh, Utrecht: LOT publications.
Van Baren, Eva, Meelen, Marieke & Meijs, Lucas (2015) 'Promoting Youth Development Worldwide: The Duke of Edinburgh’s International Award' in Journal of Youth Development 10 (1).
Meelen, Marieke & Beekhuizen, Barend (2013) 'PoS-tagging and Chunking historical Welsh' in Christopher Yocum (ed.) Proceedings of the Scottish Celtic Colloquium, Edinburgh.
Meelen, Marieke (2010b) 'Gwarchan Maelderw II' in Kelten August issue, Utrecht: Van Hamel.
Meelen, Marieke (2010a) 'Gwarchan Maelderw I' in Kelten May issue, Utrecht: Van Hamel.