Discourse Structure
In Computational Linguistics, the term discourse is used to refer to a communicative event such as a text or a dialog. In our lab, in recent years the work focused on monologue text. We study text structure both from a theoretical perspective and with the goal of automatic analysis. Notice we can provide only a very brief summary here. One of our favorite phenomena is connectives, explained in a separate section below.
The Structure of Discourse
In a text, structure arises on multiple levels of description (cf. the German monograph (Stede 2018) or in (Stede 2008)), for example:
-
coreference: pronouns and definite noun phrases refer back to entities in the text
-
genre-specific zones: the genre of a text determines which functional units are part of the text
-
topic changes: in running text, stretches (often paragraphs) deal with different sub-topics of the text
-
rhetorical structure: according to several theories, coherence relations hold between (mostly) adjacent spans of text, yielding a tree or graph structure for the complete text
A fair amount of our work dealt with Rhetorical Structure Theory (Mann/Thompson 1988), both for text analysis and generation. We provide annotation guidelines for RST, which originated in collaboration with Maite Taboada (SFU, Vancouver) (Stede et al. 2017). Also, we suggested certain modifications to the theory, especially concerning nuclearity in (Stede 2008). Our Potsdam Commentary Corpus (PCC) has RST trees as one annotation layer.
We also provide these guidelines for RST annotation on Twitter.
A view of discourse analysis that makes fewer commitments on an overarching text structure is embodied in the Penn Discourse Treebank (PDTB), and it lead to the computational task of shallow discourse parsing (SDP). To help bootstrapping SDP work on German, we created a machine-translated German version of the PDTB texts, and automatically projected the discourse annotations (Sluyter-Gaethje et al. 2020).
But other levels of analysis are of equal importance to us, see e.g. our separate page on Coreference.
Discourse Connectives
Discourse connectives are lexical items that encode semantic or pragmatic relations between adjacent spans of text, such as causality or contrast. We have developed analyses of particular connectives and groups of them (especially causal, contrastive and concessive ones), and also comparative analyses across several languages. One result of our work is DiMLex, a computer-readable discourse connective lexicon for German (Stede 2002). We recently contributed to building an Italian version (Feltracco et al. 16), a Dutch version (Bourgonje et al. 2018), an English version (Das et al. 2018) and a Bangla version (Das et al. 2020), and released a multilingual connective database (see "Resources" below) that was built in collaboration with partners from the TextLink network.
Also, one of the annotation layers in our Potsdam Commentary Corpus (PCC) follows the Penn Discourse Treebank framework, including annotations for connectives, their arguments and relation sense.
Discourse Parsing
Our work on automatic analysis started with the first SVM-based RST parser (for German) by Reitter (2003). Recently, our focus became shallow discourse parsing in the style of the Penn Discourse Treebank (PDTB). With colleagues in Oslo and Teesside, we built the best-performing English discourse parser for the CONLL 2016 Shared Task (Oepen et al. 2016). Furthermore, we built the first shallow discourse parser for German (Bourgonje/Stede 2019/20) and made it available (beta version) for research purposes (see below).
Other topics that we addressed include the analysis of genre-specific zones (e.g., Bieler et al. 2007) or the disambiguation of German connectives (Dipper/Stede 2006, Schneider/Stede 2012, Bourgonje/Stede 2018).
For a general overview, the range of subtopics of discourse processing is explained in the monograph Discourse Processing (Stede 2011).
Related Projects
- AnaKonn: Anaphoricity in Connectives (DFG)
- SFB1087-A03: Discourse Strategies in Social Media
Related Resources
- DiMLex: A lexicon of German discourse markers
- ConnectiveLex: A web-based connective database covering English, French, German, Italian, Dutch, Bangla, Czech, Arabic and Portugese
- ConnAnno: A Java tool for semi-manually annotating connectives and their arguments
- GerSDP: A shallow discourse parser for German (beta)
Related publications:
- Hannah J. Seemann, Sara Shahmohammadi, Manfred Stede, and Tatjana Scheffler. Spoken vs. Written Computer-Mediated Communication. In Proceedings of the 11th Conference on Computer-Mediated Communication (CMC) and Social Media Corpora. Nice, France, 2024. (to appear). [Bibtex]
- Hannah J. Seemann, Sara Shahmohammadi, Manfred Stede, and Tatjana Scheffler. Discourse-Level Features in Spoken and Written Communication. In Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024). Vienna, 2024. [Bibtex] [PDF]
- Berfin Aktaş and Burak Özmen. Shallow Discourse Parsing on Twitter Conversations. In Proceedings of the 20th Joint ACL- ISO Workshop on Interoperable Semantic Annotation (LREC - ISA-20). Turin, Italy, May 2024. [Bibtex] [PDF]
- Sara Shahmohammadi, Hannah Seemann, Manfred Stede, and Tatjana Scheffler. Encoding discourse structure: comparison of RST and QUD. In Proceedings of the 4th Workshop on Computational Approaches to Discourse (CODI 2023), 89–98. Toronto, Canada, July 2023. Association for Computational Linguistics. [Bibtex] [PDF]
- Rene Knaebel and Manfred Stede. Discourse sense flows: Modelling the rhetorical style of documents across various domains. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, 14462–14482. Singapore, December 2023. Association for Computational Linguistics. [Bibtex] [DOI] [PDF]
- Sophia Rauh, Karolina Zaczynska, and Peter Bourgonje. Toward a Multilingual Connective Database: Aligning German/French Concessive Connectives. In Proceedings of the 19th Conference on Natural Language Processing (KONVENS 2023), 77–84. Ingolstadt, Germany, 2023. Association for Computational Linguistics. [Bibtex] [PDF]
- Yulia Clausen and Manfred Stede. Discourse connectives and their arguments: An experiment on anaphoricity in German. Linguistics Vanguard, 8(1):95–111, 2022. URL: https://doi.org/10.1515/lingvan-2021-0102. [Bibtex]
- Berfin Aktaş and Manfred Stede. Anaphoric distance in oral and written language: Experimental evidence. Discours, 2022. URL: https://journals.openedition.org/discours/12383. [Bibtex]
- Katarzyna Budzynska, Chris Reed, Manfred Stede, Benno Stein, and Zhang He. Framing in Communication: From Theories to Computation (Dagstuhl Seminar 22131). Dagstuhl Reports, 12(3):117–140, 2022. URL: https://drops.dagstuhl.de/opus/volltexte/2022/17271. [Bibtex] [DOI]
- Freya Hewett and Manfred Stede. Extractive summarisation for German-language data: a text-level approach with discourse features. In Proceedings of the 29th International Conference on Computational Linguistics (COLING), 756–765. Gyeongju, Republic of Korea, 2022. International Committee on Computational Linguistics. [Bibtex] [PDF]
- Anastasia Linnik, Roelien Bastiaanse, Manfred Stede, and Mariya Khudyakova. Linguistic mechanisms of coherence in aphasic and non-aphasic discourse. Aphasiology, 2021. [Bibtex] [DOI]
- F. Hewett and M. Stede. Automatically evaluating the conceptual complexity of German texts. In Proc. of KONVENS 2021. Düsseldorf, 2021. [Bibtex] [PDF]
- S. Just, E. Haegert, N. Koranova, A. Bröcker, I. Nenchev, J. Funcke, A. Heinz, F. Bempohl, M. Stede, and C. Montag. Modeling Incoherent Discourse in Non-Affective Psychosis. Frontiers in Psychiatry / Schizophrenia, 2020. URL: https://www.frontiersin.org/articles/10.3389/fpsyt.2020.00846/full. [Bibtex]
- Peter Bourgonje and Manfred Stede. Topics and Subjects in German Newspaper Editorials: A Corpus Study. In Anke Holler, Katja Suckow, and Israel de la Fuente, editors, Information Structuring in Discourse, volume 40 of Current Research in the Semantics / Pragmatics Interface. Brill, Leiden, The Netherlands, 2020. [Bibtex] [DOI]
- Deniz Zeyrek, Amália Mendes, Yulia Grishina, Murathan Kurfalı, Samuel Gibbon, and Maciej Ogrodniczuk. TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style. Language Resources and Evaluation, 54:587–613, 06 2020. URL: https://link.springer.com/article/10.1007/s10579-019-09445-9. [Bibtex] [DOI]
- René Knaebel and Manfred Stede. Semi-Supervised Tri-Training for Explicit Discourse Argument Expansion. In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020). Marseille, France, May 2020. European Language Resources Association (ELRA). [Bibtex] [PDF]
- M. Stede. From Connectives to Coherence Relations: A Case Study of German Contrastive Connectives. Revue roumaine de linguistique, LXV(3):213–233, 2020. URL: https://www.lingv.ro/images/RRL%203%202020%2003-STEDE.pdf. [Bibtex]
- Peter Bourgonje and Manfred Stede. The Potsdam Commentary Corpus 2.2: Extending Annotations for Shallow Discourse Parsing. In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020). Marseille, France, May 2020. European Language Resources Association (ELRA). [Bibtex] [PDF]
- Yulia Clausen and Tatjana Scheffler. A corpus-based analysis of meaning variations in German tag questions. Evidence from spoken and written conversational corpora. Corpus Linguistics and Linguistic Theory, 2020. URL: https://www.degruyter.com/view/j/cllt.ahead-of-print/cllt-2019-0060/cllt-2019-0060.xml. [Bibtex]
- Peter Bourgonje and Manfred Stede. Exploiting a lexical resource for discourse connective disambiguation in German. In Proceedings of the 28th International Conference on Computational Linguistics, 5737–5748. Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. [Bibtex] [PDF]
- Berfin Aktaş, Veronika Solopova, Annalena Kohnert, and Manfred Stede. Adapting Coreference Resolution to Twitter Conversations. In Findings of the Association for Computational Linguistics: EMNLP 2020. Online, 2020. Association for Computational Linguistics. [Bibtex] [DOI] [PDF]
- Berfin Aktaş and Manfred Stede. Variation in Coreference Strategies across Genres and Production Media. In Proceedings of the 28th International Conference on Computational Linguistics. Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. [Bibtex] [PDF]
- Berfin Aktaş and Annalena Kohnert. TwiConv: A Coreference-annotated Corpus of Twitter Conversations. In Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC@COLING), 47–54. Barcelona, Spain, December 2020. Association for Computational Linguistics. [Bibtex] [PDF]
- Henny Sluyter-Gäthje, Peter Bourgonje, and Manfred Stede. Shallow Discourse Parsing for Under-Resourced Languages: Combining Machine Translation and Annotation Projection. In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020). Marseille, France, May 2020. European Language Resources Association (ELRA). [Bibtex] [PDF]
- Debopam Das, Manfred Stede, Soumya Sankar Ghosh, and Lahari Chatterjee. DiMLex-Bangla: A Lexicon of Bangla Discourse Connectives. In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020). Marseille, France, May 2020. European Language Resources Association (ELRA). [Bibtex] [PDF]
- René Knaebel and Manfred Stede. Contextualized Embeddings for Connective Disambiguation in Shallow Discourse Parsing. In Proceedings of the First Workshop on Computational Approaches to Discourse, 65–75. Online, November 2020. Association for Computational Linguistics. [Bibtex] [DOI] [PDF]
- Shujun Wan, Tino Kutschbach, Anke Lüdeling, and Manfred Stede. RST-Tace: A tool for automatic comparison and evaluation of RST trees. In Proc. of the Workshop on Discourse Relation Parsing and Treebanking at NAACL. Minneapolis, MN, 2019. [Bibtex] [PDF]
- René Knaebel, Manfred Stede, and Sebastian Stober. Window-Based Neural Tagging for Shallow Discourse Argument Labeling. In Proc. of the 23rd Conference on Computational Natural Language Learning (CoNLL 2019). 2019. [Bibtex] [PDF]
- Manfred Stede, Tatjana Scheffler, and Amália Mendes. Connective-lex: A web-based multilingual lexical resource for connectives. Discours. Revue de linguistique, psycholinguistique et informatique, 2019. [Bibtex] [PDF]
- Nina Hosseini-Kivanani, Juan Camilo Vásquez-Correa, Manfred Stede, and Elmar Nöth. Automated Cross-language Intelligibility Analysis of Parkinson's Disease Patients Using Speech Recognition Technologies (Research Proposal). In Proc. of the ACL Student Research Workshop. Florence, Italy, 2019. [Bibtex] [PDF]
- Peter Bourgonje and Robin Schäfer. Multi-lingual and Cross-genre Discourse Unit Segmentation. In Proc. of the Workshop on Discourse Relation Parsing and Treebanking at NAACL. Minneapolis, MN, 2019. [Bibtex] [PDF]
- Freya Hewett, Roshan Prakash Rane, Nina Harlacher, and Manfred Stede. The utility of discourse parsing features for predicting argumentation structure. In Proc. of the 6th Workshop on Argument Mining at ACL. Florence, Italy, 2019. [Bibtex] [PDF]
- Peter Bourgonje and Olha Zolotarenko. Toward Cross-theory Discourse Relation Annotation. In Proc. of the Workshop on Discourse Relation Parsing and Treebanking at NAACL. Minneapolis, MN, 2019. [Bibtex] [PDF]
- Tatjana Scheffler, Berfin Aktaş, Debopam Das, and Manfred Stede. Annotating Shallow Discourse Relations in Twitter Conversations. In Proc. of the Workshop on Discourse Relation Parsing and Treebanking at NAACL. Minneapolis, MN, 2019. [Bibtex] [PDF]
- Peter Bourgonje and Manfred Stede. Explicit Discourse Argument Extraction for German. In Proceedings of the 21st International Conference on Text, Speech and Dialogue. Ljubljana, Slovenia, 2019. URL: https://link.springer.com/chapter/10.1007/978-3-030-27947-9_3. [Bibtex]
- S. Just, E. Haegert, N. Koránová, A. Bröcker, I. Nenchev, J. Funcke, C. Montag, and M. Stede. Coherence models in schizophrenia. In Proc. of CLPSYCH, Computational Linguistics and Clinical Psychology Workshop at NAACL. Minneapolis, MN, 2019. [Bibtex] [PDF]
- Debopam Das. Discourse segmentation in bangla. In Girish Nath Jha, Kalika Bali, Sobha L, and Atul Kr. Ojha, editors, Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018). Paris, France, May 2018. European Language Resources Association (ELRA). [Bibtex] [PDF]
- Debopam Das and Maite Taboada. RST Signalling Corpus: a corpus of signals of coherence relations. Language Resources and Evaluation, 52(1):149–184, 2018. [Bibtex] [PDF]
- Amália Mendes, Iria del Rio, Manfred Stede, and Felix Dombek. A Lexicon of Discourse Markers for Portuguese – LDM-PTs. In N. Calzolari et al., editor, Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC'18). Miyazaki, Japan, 2018. European Language Resources Association (ELRA). [Bibtex] [PDF]
- Elena Musi, Tariq Alhindi, Manfred Stede, Leonard Kriese, Smaranda Muresan, and Andrea Rocci. A multi-layer annotated corpus of argumentative text: from argument schemes to discourse relations. In N. Calzolari et al., editor, Proceedings of the 11h International Conference on Language Resources and Evaluation (LREC'18). Miyazaki, Japan, 2018. European Language Resources Association (ELRA). [Bibtex] [PDF]
- Debopam Das and Manfred Stede. Developing the Bangla RST Discourse Treebank. In Nicoletta Calzolari (Conference chair), Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga, editors, Proceedings of the 11h International Conference on Language Resources and Evaluation (LREC'18). Miyazaki, Japan, 2018. European Language Resources Association (ELRA). [Bibtex] [PDF]
- Peter Bourgonje and Manfred Stede. The Potsdam Commentary Corpus 2.1 in ANNIS3. In Proceedings of the 17th International Workshop on Treebanks and Linguistic Theory. Oslo, Norway, 2018. [Bibtex]
- Laurence Danlos, Katerina Rysova, Magdalena Rysova, and Manfred Stede. Primary and secondary discourse connectives: definitions and lexicons. Dialogue and Discourse, 9(1):50–78, 2018. URL: http://dad.uni-bielefeld.de/index.php/dad/article/view/3734. [Bibtex]
- Tatjana Scheffler, Manfred Stede, Peter Bourgonje, and Felix Dombek. A multilingual database of connectives: connective-lex.info. In Ho-Dac and Muller, editors, Cross-Linguistic Discourse Annotation: applications and perspectives, pages 144–150. 2018. [Bibtex]
- Peter Bourgonje and Manfred Stede. Identifying explicit discourse connectives in German. In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, 327–331. Melbourne, Australia, 2018. Association for Computational Linguistics. [Bibtex] [PDF]
- Das Debopam, Tatjana Scheffler, Peter Bourgonje, and Manfred Stede. Constructing a lexicon of english discourse connectives. In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, 360–365. Melbourne, Australia, 2018. Association for Computational Linguistics. [Bibtex] [PDF]
- Debopam Das, Maite Taboada, and Manfred Stede. The good, the bad, and the disagreement: complex ground truth in rhetorical structure analysis. In Workshop on Recent Advances in RST and Related Formalisms. Santiago de Compostela, Spain, September 2017. [Bibtex] [PDF]
- W. Sidorenko and M. Stede. Potsdam Tweet Annotation Guidelines: Rhetorical Structure. Unpublished manuscript, 2017. [Bibtex] [PDF]
- M. Stede, M. Taboada, and D. Das. Annotation Guidelines for Rhetorical Structure. Unpublished manuscript, 2017. [Bibtex] [PDF]
- Peter Bourgonje, Yulia Grishina, and Manfred Stede. Toward a bilingual lexical database on connectives: Exploiting a German/Italian parallel corpus. In Proceedings of the Fourth Italian Conference on Computational Linguistics. Rome, Italy, December 2017. [Bibtex] [PDF]
- Tatjana Scheffler and Manfred Stede. Realizing argumentative coherence relations in German: a contrastive study of newspaper editorials and Twitter posts. In Proceedings of the COMMA Workshop "Foundations of the Language of Argumentation". Potsdam, Germany, 2016. [Bibtex] [PDF]
- S. Oepen, J. Read, T. Scheffler, U. Sidarenka, M. Stede, E. Velldal, and L. Øvrelid. OPT: Oslo–Potsdam–Teesside—Pipelining Rules, Rankers, and Classifier Ensembles for Shallow Discourse Parsing. In Proceedings of the CONLL 2016 Shared Task. Berlin, 2016. [Bibtex] [PDF]
- Anna Feltracco, Elisabetta Jezek, Bernardo Magnini, and Manfred Stede. Lico: a lexicon of italian connectives. In Proceedings of the 3rd Italian Conference on Computational Linguistics (CLiC-it). Napoli, Italy, 2016. [Bibtex] [PDF]
- Tatjana Scheffler and Manfred Stede. Mapping pdtb-style connective annotation to RST-style discourse annotation. In Proceedings of KONVENS. Bochum, Germany, 2016. [Bibtex] [PDF]
- Andreas Peldszus and Manfred Stede. Rhetorical structure and argumentation structure in monologue text. In Proceedings of the 3rd Workshop on Argumentation Mining. Berlin, September 2016. Association for Computational Linguistics. [Bibtex] [PDF]
- Tatjana Scheffler and Manfred Stede. Adding Semantic Relations to a Large-Coverage Connective Lexicon of German. In Nicoletta Calzolari et al., editor, Proc. of the Ninth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia, may 2016. European Language Resources Association (ELRA). [Bibtex] [PDF]
- Uladzimir Sidarenka, Andreas Peldszus, and Manfred Stede. Discourse Segmentation of German Texts. Journal for Language Technology and Computational Linguistics, 30(1):71–98, 2015. [Bibtex] [PDF]
- Uladzimir Sidarenka, Matthias Bisping, and Manfred Stede. Applying Rhetorical Structure Theory to Twitter Conversations. In Proceedings of DiSpol 2015. Saarbrücken, Germany, October 2015. [Bibtex] [PDF]
- Anastasia Linnik, Roelien Bastiaanse, and Barbara Hoehle. Discourse production in aphasia: a current review of theoretical and methodological challenges. Aphasiology, 2015. URL: http://dx.doi.org/10.1080/02687038.2015.1113489. [Bibtex]
- Arne Neumann. Discoursegraphs: a graph-based merging tool and converter for multilayer annotated corpora. In Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015), 309–312. 2015. [Bibtex]
- Tatjana Scheffler. Two-Dimensional Semantics: Clausal Adjuncts and Complements Volume 549 of Linguistische Arbeiten. De Gruyter, Berlin/Boston, 2013. ISBN 9783110302141. [Bibtex]
- A. Schneider and M. Stede. Ambiguity in German connectives: a corpus study. In Proceedings of the KONVENS Conference. Vienna, 2012. [Bibtex]
- Manfred Stede and Andreas Peldszus. The role of illocutionary status in the usage conditions of causal connectives and in coherence relations. Journal of Pragmatics, 44(2):214–229, 2012. [Bibtex] [DOI]
- Manfred Stede. Discourse Processing Volume 15 of Synthesis Lectures in Human Language Technology. Morgan & Claypool, 2011. [Bibtex]
- Manfred Stede. Disambiguating rhetorical structure. Research on Language and Computation, 6(3):311–332, 2008. [Bibtex]
- Manfred Stede. RST revisited: Disentangling nuclearity. In Cathrine Fabricius-Hansen and Wiebke Ramm, editors, `Subordination' versus `coordination' in sentence and text. John Benjamins, Amsterdam, 2008. [Bibtex]
- Heike Bieler, Stefanie Dipper, and Manfred Stede. Identifying formal and functional zones in film reviews. In Proc. of the 8th SIGDIAL Workshop. Antwerp, 2007. [Bibtex]
- Michael Grabski and Manfred Stede. 'bei': intra-clausal coherence relations illustrated with a German preposition. Discourse Processes, 41(2):195–219, 2006. [Bibtex]
- M. Stede S. Dipper. Disambiguating potential connectives. In Proceedings of the KONVENS Conference. Konstanz, 2006. [Bibtex]
- David Reitter. Simple signals for complex rhetorics: on rhetorical analysis with rich-feature support-vector models. Journal for Language Technology and Computational Linguistics (LDV Forum), 18(2):38–52, 2003. [Bibtex]
- Manfred Stede. DiMLex: A Lexical Approach to Discourse Markers. In Exploring the Lexicon - Theory and Computation. Edizioni dell'Orso, Alessandria, 2002. [Bibtex]
- Manfred Stede and Carla Umbach. Dimlex: a lexicon of discourse markers for text generation and understanding. In Proceedings of the 17th international conference on Computational linguistics-Volume 2, 1238–1242. Association for Computational Linguistics, 1998. [Bibtex]