Discourse Structure

In Computational Linguistics, the term discourse is used to refer to a communicative event such as a text or a dialog. In our lab, in recent years the work focused on monologue text. We study text structure both from a theoretical perspective and with the goal of automatic analysis. Notice we can provide only a very brief summary here. One of our favorite phenomena is connectives, explained in a separate section below.

The Structure of Discourse

In a text, structure arises on multiple levels of description (as explained in the German monograph (Stede 2007)), for example:

A fair amount of our work dealt with Rhetorical Structure Theory (Mann/Thompson 1988), both for text analysis and generation. We suggested modifications, especially concerning nuclearity in (Stede 2008). The Potsdam Commentary Corpus (PCC) has RST trees as one annotation layer. But other levels are of equal importance to us, see e.g. our separate page on Coreference

Discourse Connectives

Discourse connectives are lexical items that encode semantic or pragmatic relations between adjacent spans of text, such as causality or contrast. We have developed analyses of particular connectives and groups of them (especially causal, contrastive and concessive ones), and also comparative analyses across several languages. One result of our work is DiMLex, a computer-readable discourse connective lexicon for German (Stede 2002). We recently contributed to building an Italian version (Feltracco et al. 16), and versions for English, French and Portugese are currently built in collaboration with partners from the TextLink network.

Also, one of the annotation layers in our Potsdam Commentary Corpus (PCC) is connectives and their arguments.

Discourse Parsing

The range of subtopics of discourse processing is explained in the monograph Discourse Processing (Stede 2011).

Our work on automatic analysis started with the first SVM-based RST parser (for German) by Reitter (2003). Very recently, our focus became shallow discourse parsing in the style of the Penn Discourse Treebank (PDTB). With colleagues in Oslo and Teesside, we built the best-performing English discourse parser for the CONLL 2016 Shared Task (Oepen et al. 2016).

Other topics that we addressed include the analysis of genre-specific zones (e.g., Bieler et al. 2007) or the disambiguation of German connectives (Dipper/Stede 2006, Schneider/Stede 2012).

Related Resources

Related publications: