In Computational Linguistics, the term discourse is used to refer to a communicative event such as a text or a dialog. In our lab, in recent years the work focused on monologue text. We study text structure both from a theoretical perspective and with the goal of automatic analysis. Notice we can provide only a very brief summary here. One of our favorite phenomena is connectives, explained in a separate section below.
The Structure of Discourse
In a text, structure arises on multiple levels of description (as explained in the German monograph (Stede 2007)), for example:
coreference: pronouns and definite noun phrases refer back to entities in the text
genre-specific zones: the genre of a text determines which functional units are part of the text
topic changes: in running text, stretches (often paragraphs) deal with different sub-topics of the text
rhetorical structure: according to several theories, coherence relations hold between (mostly) adjacent spans of text, yielding a tree or graph structure for the complete text
A fair amount of our work dealt with Rhetorical Structure Theory (Mann/Thompson 1988), both for text analysis and generation. We provide annotation guidelines for RST, which originated in collaboration with Maite Taboada (SFU, Vancouver) (Stede et al. 2017). Also, we suggested certain modifications to the theory, especially concerning nuclearity in (Stede 2008). The Potsdam Commentary Corpus (PCC) has RST trees as one annotation layer. But other levels are of equal importance to us, see e.g. our separate page on Coreference
Discourse connectives are lexical items that encode semantic or pragmatic relations between adjacent spans of text, such as causality or contrast. We have developed analyses of particular connectives and groups of them (especially causal, contrastive and concessive ones), and also comparative analyses across several languages. One result of our work is DiMLex, a computer-readable discourse connective lexicon for German (Stede 2002). We recently contributed to building an Italian version (Feltracco et al. 16), and now released a multilingual connective database (see "Resources" below) that was built in collaboration with partners from the TextLink network.
Also, one of the annotation layers in our Potsdam Commentary Corpus (PCC) is connectives and their arguments.
The range of subtopics of discourse processing is explained in the monograph Discourse Processing (Stede 2011).
Our work on automatic analysis started with the first SVM-based RST parser (for German) by Reitter (2003). Very recently, our focus became shallow discourse parsing in the style of the Penn Discourse Treebank (PDTB). With colleagues in Oslo and Teesside, we built the best-performing English discourse parser for the CONLL 2016 Shared Task (Oepen et al. 2016).
Other topics that we addressed include the analysis of genre-specific zones (e.g., Bieler et al. 2007) or the disambiguation of German connectives (Dipper/Stede 2006, Schneider/Stede 2012).
- DiMLex: A lexicon of German discourse markers
- ConnectiveLex: A web-based connective database covering English, French, German, Italian, and Portugese
- ConAno: A Java tool for semi-manually annotating connectives and their arguments
- Debopam Das and Manfred Stede. Developing the Bangla RST Discourse Treebank. In N. Calzolari et al., editor, Proceedings of the 11h International Conference on Language Resources and Evaluation (LREC'18). Miyazaki, Japan, 2018. European Language Resources Association (ELRA). (to appear). [Bibtex]
- Amália Mendes, Iria del Rio, Manfred Stede, and Felix Dombek. A Lexicon of Discourse Markers for Portuguese – LDM-PTs. In N. Calzolari et al., editor, Proceedings of the 11h International Conference on Language Resources and Evaluation (LREC'18). Miyazaki, Japan, 2018. European Language Resources Association (ELRA). (to appear). [Bibtex]
- Debopam Das and Maite Taboada. RST Signalling Corpus: a corpus of signals of coherence relations. Language Resources and Evaluation, 52:149–184, 2018. [Bibtex]
- Elena Musi, Tariq Alhindi, Manfred Stede, Leonard Kriese, Smaranda Muresan, and Andrea Rocci. A multi-layer annotated corpus of argumentative text: from argument schemes to discourse relations. In N. Calzolari et al., editor, Proceedings of the 11h International Conference on Language Resources and Evaluation (LREC'18). Miyazaki, Japan, 2018. European Language Resources Association (ELRA). (to appear). [Bibtex]
- M. Stede, M. Taboada, and D. Das. Annotation Guidelines for Rhetorical Structure. Unpublished manuscript, 2017. [Bibtex] [PDF]
- Peter Bourgonje, Yulia Grishina, and Manfred Stede. Toward a bilingual lexical database on connectives: Exploiting a German/Italian parallel corpus. In Proceedings of the Fourth Italian Conference on Computational Linguistics. Rome, Italy, December 2017. [Bibtex] [PDF]
- Debopam Das, Maite Taboada, and Manfred Stede. The good, the bad, and the disagreement: complex ground truth in rhetorical structure analysis. In Workshop on Recent Advances in RST and Related Formalisms. Santiago de Compostela, Spain, September 2017. [Bibtex] [PDF]
- Tatjana Scheffler and Manfred Stede. Mapping pdtb-style connective annotation to RST-style discourse annotation. In Proceedings of KONVENS. Bochum, Germany, 2016. [Bibtex] [PDF]
- Andreas Peldszus and Manfred Stede. Rhetorical structure and argumentation structure in monologue text. In Proceedings of the 3rd Workshop on Argumentation Mining. Berlin, September 2016. Association for Computational Linguistics. [Bibtex] [PDF]
- Tatjana Scheffler and Manfred Stede. Realizing argumentative coherence relations in German: a contrastive study of newspaper editorials and Twitter posts. In Proceedings of the COMMA Workshop "Foundations of the Language of Argumentation". Potsdam, Germany, 2016. [Bibtex] [PDF]
- S. Oepen, J. Read, T. Scheffler, U. Sidarenka, M. Stede, E. Velldal, and L. Øvrelid. OPT: Oslo–Potsdam–Teesside---Pipelining Rules, Rankers, and Classifier Ensembles for Shallow Discourse Parsing. In Proceedings of the CONLL 2016 Shared Task. Berlin, 2016. [Bibtex] [PDF]
- Tatjana Scheffler and Manfred Stede. Adding Semantic Relations to a Large-Coverage Connective Lexicon of German. In Nicoletta Calzolari et al., editor, Proc. of the Ninth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia, may 2016. European Language Resources Association (ELRA). [Bibtex] [PDF]
- Anna Feltracco, Elisabetta Jezek, Bernardo Magnini, and Manfred Stede. Lico: a lexicon of italian connectives. In Proceedings of the 3rd Italian Conference on Computational Linguistics (CLiC-it). Napoli, Italy, 2016. [Bibtex] [PDF]
- Anastasia Linnik, Roelien Bastiaanse, and Barbara Hoehle. Discourse production in aphasia: a current review of theoretical and methodological challenges. Aphasiology, 2015. URL: http://dx.doi.org/10.1080/02687038.2015.1113489. [Bibtex]
- Arne Neumann. Discoursegraphs: a graph-based merging tool and converter for multilayer annotated corpora. In Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015), 309–312. 2015. [Bibtex]
- Uladzimir Sidarenka, Andreas Peldszus, and Manfred Stede. Discourse Segmentation of German Texts. Journal for Language Technology and Computational Linguistics, 30(1):71–98, 2015. [Bibtex] [PDF]
- Uladzimir Sidarenka, Matthias Bisping, and Manfred Stede. Applying Rhetorical Structure Theory to Twitter Conversations. In Proceedings of DiSpol 2015. Saarbrücken, Germany, October 2015. [Bibtex] [PDF]
- Tatjana Scheffler. Two-Dimensional Semantics: Clausal Adjuncts and Complements volume 549 of Linguistische Arbeiten. De Gruyter, Berlin/Boston, 2013. [Bibtex]
- A. Schneider and M. Stede. Ambiguity in German connectives: a corpus study. In Proceedings of the KONVENS Conference. Vienna, 2012. [Bibtex]
- Manfred Stede and Andreas Peldszus. The role of illocutionary status in the usage conditions of causal connectives and in coherence relations. Journal of Pragmatics, 44(2):214–229, 2012. [Bibtex] [DOI]
- Manfred Stede. Discourse Processing volume 15 of Synthesis Lectures in Human Language Technology. Morgan & Claypool, 2011. [Bibtex]
- Manfred Stede. Disambiguating rhetorical structure. Research on Language and Computation, 6(3):311–332, 2008. [Bibtex]
- Manfred Stede. RST revisited: Disentangling nuclearity. In Cathrine Fabricius-Hansen and Wiebke Ramm, editors, `Subordination' versus `coordination' in sentence and text. John Benjamins, Amsterdam, 2008. [Bibtex]
- Heike Bieler, Stefanie Dipper, and Manfred Stede. Identifying formal and functional zones in film reviews. In Proc. of the 8th SIGDIAL Workshop. Antwerp, 2007. [Bibtex]
- M. Stede S. Dipper. Disambiguating potential connectives. In Proceedings of the KONVENS Conference. Konstanz, 2006. [Bibtex]
- Michael Grabski and Manfred Stede. 'bei': intra-clausal coherence relations illustrated with a German preposition. Discourse Processes, 41(2):195–219, 2006. [Bibtex]
- David Reitter. Simple signals for complex rhetorics: on rhetorical analysis with rich-feature support-vector models. Journal for Language Technology and Computational Linguistics (LDV Forum), 18(2):38–52, 2003. [Bibtex]
- Manfred Stede. DiMLex: A Lexical Approach to Discourse Markers. In Exploring the Lexicon - Theory and Computation. Edizioni dell'Orso, Alessandria, 2002. [Bibtex]
- Manfred Stede and Carla Umbach. Dimlex: a lexicon of discourse markers for text generation and understanding. In Proceedings of the 17th international conference on Computational linguistics-Volume 2, 1238–1242. Association for Computational Linguistics, 1998. [Bibtex]