ProMeTEXT — the Centre for PROcessing MEdieval TEXTs — develops corpora, methods, and tools for the segmentation and multilingual alignment of medieval texts.
Our work focuses primarily on medieval romance texts from the 13th to 16th centuries, with the aim of supporting research in textual transmission, multilingual corpora, computational humanities, and historical language processing.
ProMeTEXT provides resources for working with medieval texts in multiple languages, including:
- phrase-level segmentation data
- multilingual alignment tools
- evaluation datasets
- notebooks and reproducible workflows
- resources for medieval textual transmission studies
Current resources cover the following medieval languages:
- Latin
- French
- Castilian
- Portuguese
- Catalan
- English
- Italian
We welcome collaboration on additional languages. Please feel free to contact us if you would like to contribute data, methods, or language-specific expertise.
We provide a gold-standard dataset for phrase-level segmentation of medieval corpora.
Please cite the dataset itself as follows when using the released data:
APA
Ing, L., Gille Levenson, M., & Macedo, C. (2025). Multilingual Segmentation Dataset for Historical Prose (13th–16th c.) (Version 1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.16992629
BibTeX
@dataset{ing2025multilingual,
author = {Ing, L. and Gille Levenson, M. and Macedo, C.},
title = {Multilingual Segmentation Dataset for Historical Prose (13th--16th c.)},
year = {2025},
publisher = {Zenodo},
version = {1.0},
doi = {10.5281/zenodo.16992629},
url = {https://doi.org/10.5281/zenodo.16992629},
license = {Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International}
}For the method, experiments, and scientific context, please cite the related publication:
APA
Ing, L., Gille Levenson, M., & Macedo, C. (2026). Phrase-Level Segmentation on Medieval Corpora for Aligning Multilingual Texts. In Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026) (pp. 936–946). European Language Resources Association (ELRA). https://doi.org/10.63317/32huzuuokpfr
BibTeX
@inproceedings{ing-etal-2026-phrase,
title = {Phrase-Level Segmentation on Medieval Corpora for Aligning Multilingual Texts},
author = {Ing, Lucence and Gille Levenson, Matthias and Macedo, Carolina},
booktitle = {Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)},
month = {May},
year = {2026},
pages = {936--946},
address = {Palma, Mallorca, Spain},
publisher = {European Language Resources Association (ELRA)},
editor = {Piperidis, Stelios and Bel, Núria and van den Heuvel, Henk and Ide, Nancy and Krek, Simon and Toral, Antonio},
doi = {10.63317/32huzuuokpfr}
}
We provide gold-standard alignment data for parallel medieval and historical texts through the following repository:
Other parallel corpora and alignment datasets are currently under construction.
Aquilign is a tool for aligning multilingual texts at phrase level. It was designed for medieval textual traditions and supports multilingual alignment workflows across related witnesses and translations.
Please cite Aquilign as follows:
APA
Gille Levenson, M., Ing, L., & Camps, J.-B. (2024). Textual Transmission without Borders: Multiple Multilingual Alignment and Stemmatology of the “Lancelot en prose” (Medieval French, Castilian, Italian). In Proceedings of the Computational Humanities Research Conference 2024, CEUR Workshop Proceedings, 3834, 65–92. https://ceur-ws.org/Vol-3834/paper104.pdf
BibTeX
@inproceedings{gillelevensonTextualTransmissionBorders2024,
title = {Textual Transmission without Borders: Multiple Multilingual Alignment and Stemmatology of the ``Lancelot en prose'' (Medieval French, Castilian, Italian)},
shorttitle = {Textual Transmission without Borders},
booktitle = {Proceedings of the Computational Humanities Research Conference 2024},
author = {Gille Levenson, Matthias and Ing, Lucence and Camps, Jean-Baptiste},
editor = {Haverals, Wouter and Koolen, Marijn and Thompson, Laure},
year = {2024},
series = {CEUR Workshop Proceedings},
volume = {3834},
pages = {65--92},
publisher = {CEUR},
address = {Aarhus},
issn = {1613-0073},
url = {https://ceur-ws.org/Vol-3834/paper104.pdf},
langid = {english}
}A demo app for Aquilign Explorer is available on Hugging Face Spaces:
It provides an early interface for testing and demonstrating multilingual medieval text alignment workflows.
Example notebooks and reproducible workflows are available here:
multilingual-medieval-aligner-notebooks
These notebooks can be used to test ProMeTEXT tools on sample data or on your own medieval textual corpora.
Please cite the relevant dataset, tool, or paper depending on the resource you use:
- For the released segmentation dataset, cite Ing, Gille Levenson, and Macedo 2025.
- For the phrase-level segmentation method and experiments, cite Ing, Gille Levenson, and Macedo 2026.
- For Aquilign and multilingual alignment workflows, cite Gille Levenson, Ing, and Camps 2024.
We welcome collaboration with researchers, developers, and institutions working on medieval texts, historical languages, digital philology, computational humanities, and multilingual corpora.
You can contribute by:
- suggesting or adding support for additional medieval languages
- contributing annotated corpora
- improving segmentation or alignment workflows
- testing the tools on new textual traditions
- reporting issues in the relevant repositories
For questions, feedback, or collaboration proposals, please open an issue in the relevant repository or contact the ProMeTEXT team.