The project Anaphoricity in connectives: From corpus analysis to lexical description and consequences for discourse parsing deals with non-structural discourse connectives in German and English. Discourse connectives are linguistic elements connecting two propositions (examples are but, because, however, etc.), and are thus essentially two-place predicates. The group of connectives is typically divided into structural and non-structural connectives, where the structural connectives take their arguments based on syntactic constraints, while for non-structural connectives one argument can be inferred from the discourse, hence be anaphoric. This project focuses on the latter group, and addresses the following key problems:

The Computational-Linguistic interest in connectives stems from the task of 'Shallow Discourse Parsing', which automatically detects the presence of coherence relations (such as those signalled by connectives) in text. Our project addresses this task as well, aiming at making the first such discourse parser for German available.

Project Goals and Interim Results

On the linguistic side, the project seeks solutions for the above mentioned problems by systematic studies of non-structural connectives in authentic contexts, i.e. based on corpora.

We see a bilingual approach of our project as important: Subclassifications of connectives in terms of discourse-structural features, for example, are much more informative when performed parallel on more than one language. Compared to English, the relatively free word order in German renders many phenomena with non-structural connectives more challenging. Specifically, the goals of this project are the following. (1) and (2) amount to the core tasks of linguistic investigation, which will address the problems P1-P4 summarized above, and (4) seeks to exploit the results for discourse parsing.

On the computational side, the findings on argument assignment will be translated into consequences for the application of discourse structure annotation and for automatic discourse parsing. We aim at building a parser that, compared to the state of the art, uses more sophisticated ways of associating adverbial connectives with their arguments. Given the bilingual database, our approach will be applicable to both English and German, but the primary goal is to construct the first shallow discourse parser for German. Bourgonje and Stede, 2018 and Bourgonje and Stede, 2019 describe experiments working toward this goal.

A beta version of the parser is available here.

Furthermore, to support data-oriented exploration and analysis related to the phenomena explained above, in the course of this project, the Potsdam Commentary Corpus has been made publicly available through the ANNIS3 web corpus browser (Bourgonje and Stede, 2018), and a large Wikipedia dump (January 2019) has been indexed for efficient and convenient querying.

Project team

Peter Bourgonje
Prof. Dr. Manfred Stede
Dr. Yulia Grishina (2018) Yulia Clausen (2020)