In Computational Linguistics, the term discourse is used to refer to a communicative event such as a text or a dialog. In our lab, in recent years the work focused on monologue text. We study text structure both from a theoretical perspective and with the goal of automatic analysis. Notice we can provide only a very brief summary here. One of our favorite phenomena is connectives, explained in a separate section below.
The Structure of Discourse
In a text, structure arises on multiple levels of description (as explained in the German monograph (Stede 2007)), for example:
coreference: pronouns and definite noun phrases refer back to entities in the text
genre-specific zones: the genre of a text determines which functional units are part of the text
topic changes: in running text, stretches (often paragraphs) deal with different sub-topics of the text
rhetorical structure: according to several theories, coherence relations hold between (mostly) adjacent spans of text, yielding a tree or graph structure for the complete text
A fair amount of our work dealt with Rhetorical Structure Theory (Mann/Thompson 1988), both for text analysis and generation. We suggested modifications, especially concerning nuclearity in (Stede 2008). The Potsdam Commentary Corpus (PCC) has RST trees as one annotation layer. But other levels are of equal importance to us, see e.g. our separate page on Coreference
Discourse connectives are lexical items that encode semantic or pragmatic relations between adjacent spans of text, such as causality or contrast. We have developed analyses of particular connectives and groups of them (especially causal, contrastive and concessive ones), and also comparative analyses across several languages. One result of our work is DiMLex, a computer-readable discourse connective lexicon for German (Stede 2002). We recently contributed to building an Italian version (Feltracco et al. 16), and versions for English, French and Portugese are currently built in collaboration with partners from the TextLink network.
Also, one of the annotation layers in our Potsdam Commentary Corpus (PCC) is connectives and their arguments.
The range of subtopics of discourse processing is explained in the monograph Discourse Processing (Stede 2011).
Our work on automatic analysis started with the first SVM-based RST parser (for German) by Reitter (2003). Very recently, our focus became shallow discourse parsing in the style of the Penn Discourse Treebank (PDTB). With colleagues in Oslo and Teesside, we built the best-performing English discourse parser for the CONLL 2016 Shared Task (Oepen et al. 2016).
Other topics that we addressed include the analysis of genre-specific zones (e.g., Bieler et al. 2007) or the disambiguation of German connectives (Dipper/Stede 2006, Schneider/Stede 2012).
- DiMLex: A lexicon of German discourse markers
- ConAno: A Java tool for semi-manually annotating connectives and their arguments
- Andreas Peldszus and Manfred Stede. Rhetorical structure and argumentation structure in monologue text. In Proceedings of the 3rd Workshop on Argumentation Mining. Berlin, September 2016. Association for Computational Linguistics. [Bibtex] [PDF]
- Tatjana Scheffler and Manfred Stede. Mapping pdtb-style connective annotation to RST-style discourse annotation. In Proceedings of KONVENS. Bochum, Germany, 2016. [Bibtex] [PDF]
- Anna Feltracco, Elisabetta Jezek, Bernardo Magnini, and Manfred Stede. Lico: a lexicon of italian connectives. In Proceedings of the 3rd Italian Conference on Computational Linguistics (CLiC-it). Napoli, Italy, 2016. [Bibtex] [PDF]
- Tatjana Scheffler and Manfred Stede. Realizing argumentative coherence relations in German: a contrastive study of newspaper editorials and Twitter posts. In Proceedings of the COMMA Workshop "Foundations of the Language of Argumentation". Potsdam, Germany, 2016. [Bibtex] [PDF]
- S. Oepen, J. Read, T. Scheffler, U. Sidarenka, M. Stede, E. Velldal, and L. Øvrelid. OPT: Oslo–Potsdam–Teesside---Pipelining Rules, Rankers, and Classifier Ensembles for Shallow Discourse Parsing. In Proceedings of the CONLL 2016 Shared Task. Berlin, 2016. [Bibtex] [PDF]
- Tatjana Scheffler and Manfred Stede. Adding Semantic Relations to a Large-Coverage Connective Lexicon of German. In Nicoletta Calzolari et al., editor, Proc. of the Ninth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia, may 2016. European Language Resources Association (ELRA). [Bibtex] [PDF]
- Arne Neumann. Discoursegraphs: a graph-based merging tool and converter for multilayer annotated corpora. In Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015), 309–312. 2015. [Bibtex]
- Uladzimir Sidarenka, Matthias Bisping, and Manfred Stede. Applying Rhetorical Structure Theory to Twitter Conversations. In Proceedings of DiSpol 2015. Saarbrücken, Germany, October 2015. [Bibtex] [PDF]
- Anastasia Linnik, Roelien Bastiaanse, and Barbara Hoehle. Discourse production in aphasia: a current review of theoretical and methodological challenges. Aphasiology, 2015. URL: http://dx.doi.org/10.1080/02687038.2015.1113489. [Bibtex]
- Uladzimir Sidarenka, Andreas Peldszus, and Manfred Stede. Discourse Segmentation of German Texts. Journal for Language Technology and Computational Linguistics, 30(1):71–98, 2015. [Bibtex] [PDF]
- Tatjana Scheffler. Two-Dimensional Semantics: Clausal Adjuncts and Complements volume 549 of Linguistische Arbeiten. De Gruyter, Berlin/Boston, 2013. [Bibtex]
- Manfred Stede and Andreas Peldszus. The role of illocutionary status in the usage conditions of causal connectives and in coherence relations. Journal of Pragmatics, 44(2):214–229, 2012. [Bibtex] [DOI]
- A. Schneider and M. Stede. Ambiguity in German connectives: a corpus study. In Proceedings of the KONVENS Conference. Vienna, 2012. [Bibtex]
- Manfred Stede. Discourse Processing volume 15 of Synthesis Lectures in Human Language Technology. Morgan & Claypool, 2011. [Bibtex]
- Manfred Stede. RST revisited: Disentangling nuclearity. In Cathrine Fabricius-Hansen and Wiebke Ramm, editors, `Subordination' versus `coordination' in sentence and text. John Benjamins, Amsterdam, 2008. [Bibtex]
- Manfred Stede. Disambiguating rhetorical structure. Research on Language and Computation, 6(3):311–332, 2008. [Bibtex]
- Heike Bieler, Stefanie Dipper, and Manfred Stede. Identifying formal and functional zones in film reviews. In Proc. of the 8th SIGDIAL Workshop. Antwerp, 2007. [Bibtex]
- M. Stede S. Dipper. Disambiguating potential connectives. In Proceedings of the KONVENS Conference. Konstanz, 2006. [Bibtex]
- Michael Grabski and Manfred Stede. 'bei': intra-clausal coherence relations illustrated with a German preposition. Discourse Processes, 41(2):195–219, 2006. [Bibtex]
- David Reitter. Simple signals for complex rhetorics: on rhetorical analysis with rich-feature support-vector models. Journal for Language Technology and Computational Linguistics (LDV Forum), 18(2):38–52, 2003. [Bibtex]
- Manfred Stede. DiMLex: A Lexical Approach to Discourse Markers. In Exploring the Lexicon - Theory and Computation. Edizioni dell'Orso, Alessandria, 2002. [Bibtex]
- Manfred Stede and Carla Umbach. Dimlex: a lexicon of discourse markers for text generation and understanding. In Proceedings of the 17th international conference on Computational linguistics-Volume 2, 1238–1242. Association for Computational Linguistics, 1998. [Bibtex]