The project Anaphoricity in connectives: From corpus analysis to lexical description and consequences for discourse parsing deals with non-structural discourse connectives in German and English. Discourse connectives are linguistic elements connecting two propositions (examples are but, because, however, etc.), and are thus essentially two-place predicates. The group of connectives is typically divided into structural and non-structural connectives, where the structural connectives take their arguments based on syntactic constraints, while for non-structural connectives one argument can be inferred from the discourse, hence be anahporic. This project focuses on the latter group, and addresses the following key problems:
- Problem 1: Lexical ambiguity. Connectives in general are ambiguous with respect to (1a) non-connective readings (e.g., German da, which can also be a locative anaphor ('since'/'there')) and with respect to (1b) connective sense (e.g., nämlich: Reason (difficult to render with an English adverbial; similar to 'after all') versus Elaboration/Specification ('in particular')).
- Problem 2: Non-adjacent extargs. Contrary to simplifying assumptions in RST and in implemented discourse parsers, arguments need not be adjacent. Corpus evidence shows that this is in fact quite often the case.
- Problem 3: Vague boundaries of intargs and extargs. Given a non-structural connective, for both types of arguments (but more often for extargs), their precise boundaries are often difficult to agree on.
- Problem 4: Non-explicit extargs. In some cases, the extarg is not given explicitly in the text but must be inferred by the reader. This is an unresolved problem both for manual annotation.
The project seeks solutions for the above mentioned problems by systematic studies of non-structural connectives in authentic contexts, i.e, based on corpora.
We see a bilingual approach of our project as important: Subclassifications of connectives in terms of discourse-structural features, for example, are much more informative when performed parallel on more than one language. Compared to English, the relatively free word order in German renders many phenomena with non-structural connectives more challenging. Specifically, the goals of this project are the following. (1) and (2) amount to the core tasks of linguistic investigation, which will address the problems P1-P4 summarized above, and (4) seeks to exploit the results for discourse parsing.
- 1) The first core research task within the project will be a cross-lingual comparison and corpus-based lexical description of the specified target set of non-structural connectives in German and English, including the description of their translation constraints given by morphology, syntax, semantic class and further contextual features.
- 2) The second core task will be a detailed study of the argument assignment problems by means of providing corpus evidence and linguistic explanation. This may include a comparison of the mechanisms at work for anaphoric connectives and for abstract anaphors, i.e., demonstrative and personal pronouns with propositional antecedents.
- 3) A concept for a bilingual connective database for the targeted set of German and English connectives will be designed upon the previous analyses, and a first implementation done.
- 4) The findings on argument assignment will be translated into consequences for the application of discourse structure annotation and for automatic discourse parsing. We aim at building a parser that, compared to the state of the art, uses more sophisticated ways of associating adverbial connectives with their arguments. Given the bilingual database, our approach will be applicable to both English and German.
More information on the project can be found in the grant proposal.
Peter Bourgonje Yulia Grishina Prof. Dr. Manfred Stede