COMUTE

COMUTE (Collation of Multilingual Text) is a DFG-funded project in collaboration with the Martin-Luther University of Halle/Saale and the Freie Universität Berlin. We aim at semi-automatically aligning multilingual, semi-parallel versions of (literary) texts. The alignment algorithm will eventually be integrated into the web based alignment tool “LERA” (developed by MLU Halle).

The goal of this research project is the development of an algorithm for automatically aligning semi-parallel, multilingual text versions. To achieve this, first the differences between the text versions must be identified on paragraph-, sentence-, phrase- and word level. Then, the alignment will be performed from the macro- to the micro level. The algorithm will firstly align paragraphs, then align sentences within the paragraphs and finally align words within the sentences. The computed alignment data will be made available in standardized formats (JSON, XML).

The project is based on the assumption that a semi-automatic process can reduce the considerable time and cost in comparison to a manual alignment and make the results more reliable. Upon completion, the results provided by the tool can provide new forms of exploration and analysis for semi-parallel texts. The project can therefore support various scientific disciplines in their research activities.

In the first phase, we are focusing on the comparison of German and English text versions. We are working with a corpus of texts written by Hannah Arendt. These literary texts are available in German and English and have been translated - and often edited - by Arendt herself. Since these texts are well-researched, they provide an excellent opportunity for evaluating the performance of our alignment models.

The research questions and goals of this project cut across the disciplines of Computer Science, Linguistics and Literary Sciences. The project is therefore a collaboration of the Center for Digital Systems (CeDiS) at FU Berlin (PI Dr. Brigitte Grote), the Dept. of Greek and Latin Philology at FU Berlin (PI Prof. Frank Fischer), the Dept. of Computer Science at MLU Halle/Saale (PI Prof. Paul Molitor), and our group at the University of Potsdam.

Project team

Steffen Frenzel, M.A.
Prof. Dr. Manfred Stede

Duration

COMUTE runs from 2023 to 2026

Related publications: