Argumentative Microtext Corpus

The argumentative microtext corpus consists of short texts that respond to a trigger question such as "Should everybody be obliged to pay fees for public radio/TV?" All texts have been annotated with a tree representation of the underlying argumentation. The data is divided in two parts:

Part 1 was produced by controlled text elicitation experiments. The 112 texts were originally written in German and have been professionally translated to English. Both languages are available. The English texts have also been annotated with various other annotation layers (see below).
Part 2 was produced by a crowdsourcing experiment. The 178 texts are (only) in English.

For illustration, here is an example from part 1 of the corpus (micro_b003):

Should health insurance cover alternative medical treatments?

EN: Health insurance companies should not cover treatment in complementary medicine unless the promised effect and its medical benefit have been concretely proven. Yet this very proof is lacking in most cases. Patients do often report relief of their complaints after such treatments. But as long as it is unclear as to how this works, the funds should rather be spent on therapies where one knows with certainty.

DE: Die Krankenkassen sollten Behandlungen beim Natur- oder Heilpraktiker nicht zahlen, es sei denn der versprochene Effekt und dessen medizinischer Nutzen sind handfest nachgewiesen. Genau dieser Nachweis fehlt jedoch in den meisten Fällen. Zwar verweisen die Patienten oft auf eine Linderung ihrer Beschwerden nach derartigen Behandlungen. Solange aber nicht klar ist, wieso es dazu kommt, sollte das Geld besser für Behandlungen ausgegeben werden, bei denen man es mit Sicherheit weiss.

A sample of our argumentation structure analysis (for a different text) is shown on this page.

Download links

Part 1 of the corpus. Please cite this corpus with [1] below.
Part 2 of the corpus. Please cite this part with [15] below.
Annotation guidelines for argumentation structure (in English). Please cite this document as stated therein.
Annotation guidelines for argumentation structure (in German). Please cite this document with [2] below.
Part 1 of the corpus has also been integrated into AIFDB by colleagues in Dundee.
Annotation tool: For building the graph structures over segmented texts, we developed a dedicated annotation tool: GraPAT

Additional annotations and data

For Part 1 of the English corpus, various other annotation layers have become available:

In collaboration with colleagues in Toulouse, we annotated discourse structure according to both Rhetorical Structure Theory and Segmented Discourse Representation Theory. See [3]. For a comparative analysis of argumentation structure and RST, see [4]. The data is here.
In collaboration with colleagues at Columbia University (NYC) and USI (Lugano), annotations of argumentation schemes (according to the Argumentum Model of Topics, Rigotti/Greco 2010) were built; see [5].
In collaboration with colleagues in Weimar and Paderborn, we conducted an experiment on (human) synthesis of microtexts from components as found in microtexts (see [6]). The data is available here.
Colleagues in Heidelberg built annotations of Situation Entity types (see [7]) and of implicit assumptions underlying the arguments (see [8]). (These annotations are on the German part of the corpus. The data is here.)
Colleagues in Tokyo built annotations of deep argumentative relations among segments (for 90 texts of the corpus); see [9].
Some of the texts have been annotated with Questions Under Discussion by Tatjana Scheffler and students. Information on that effort is here and the data is here.

Additional languages

An Italian version of part 1 of the corpus as well as our argumentation mining code has been produced; see [16].
A Russian version of the corpus along with cross-lingual mining experiments is described in [17], and the data is available here.
A Persian version of part 1 of the corpus has been produced; see [18].

Using the corpus: Examples

Besides our own work (see [10], [11], and this page), the corpus has been used by several researchers for purposes of argumentation mining. For instance, Stab and Gurevych [12] measured the performance of their argumentation structure analysis module. Wachsmuth et al. [13] ran experiments with tree kernels. Morio et al. [14] compared neural approaches on a variety of corpora including the microtexts.

References

[1] Andreas Peldszus and Manfred Stede. An annotated corpus of argumentative microtexts. In D. Mohammed, and M. Lewinski, editors, Argumentation and Reasoned Action - Proc. of the 1st European Conference on Argumentation, Lisbon, 2015. College Publications, London, 2016
[2] Andreas Peldszus, Saskia Warzecha, Manfred Stede: Argumenttationsstruktur. In: M. Stede (ed.): Handbuch Textannotation - Potsdamer Kommentarkorpus 2.0, S. 185-208. Universitätsverlag Potsdam, 2016
[3] Manfred Stede, Stergos Afantenos, Andreas Peldszus, Nicholas Asher, and Jérémy Perret. Parallel Discourse Annotations on a Corpus of Short Texts. In Nicoletta Calzolari et al., editor, Proc. of the Ninth International Conference on Language Resources and Evaluation (LREC 2016), Portoroz
[4] Andreas Peldszus and Manfred Stede. Rhetorical structure and argumentation structure in monologue text. In Proceedings of the 3rd Workshop on Argumentation Mining (at ACL). Berlin, 2016
[5] Elena Musi, Tariq Alhindi, Manfred Stede, Leonard Kriese, Smaranda Muresan, and Andrea Rocci. A multi-layer annotated corpus of argumentative text: from argument schemes to discourse relations. In N. Calzolari et al., editor, Proceedings of the 11h International Conference on Language Resources and Evaluation (LREC'18). Miyazaki, Japan, 2018
[6] Henning Wachsmuth, Manfred Stede, Roxanne El Baff, Khalid Al Khatib, Maria Skeppstedt, and Benno Stein. Argumentation synthesis following rhetorical strategies. In Proceedings of COLING, Santa Fe, NM, USA, 2018. To appear
[7] Becker, M., Palmer, A., and Frank, A.: Semantic Clause Types and Modality as Features for Argument Analysis. Argument & Computation 8(2), 2017
[8] Becker, M., Staniek, M., Nastase, V., and Frank, A.: Enriching Argumentative Texts with Implicit Knowledge. Frasinca, F., Ittoo, A., Nguyen, L., and Metais, E. (eds.), Applications of Natural Language to Data Bases (NLDB) - Natural Language Processing and Information Systems, Springer, 2017
[9] Paul Reisert, Naoya Inoue, Naoaki Okazaki, Kentaro Inui. A Corpus of Deep Argumentative Structures as an Explanation to Argumentative Relations. arXiv:1712.02480
[10] Andreas Peldszus and Manfred Stede. Joint prediction in MST-style discourse parsing for argumentation mining. Proc. of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), 938–948. Lisbon, Portugal, 2015
[11] Stergos Afantenos, Andreas Peldszus, and Manfred Stede. Comparing decoding mechanisms for parsing argumentative structures. Argument and Computation, 2018
[12] Christian Stab and Iryna Gurevych. Parsing argumentation structures in persuasive essays. Computational Linguistics, 43(3):619–660, 2017
[13] Henning Wachsmuth, Giovanni Da San Martino, Dora Kiesel, and Benno Stein. The Impact of Modeling Overall Argumentation with Tree Kernels. In Proc. Empirical Methods in Natural Language Processing (EMNLP), 2017
[14] Gaku Morio et al. End-to-end Argument Mining with Cross-corpora Multi-task Learning. In: Transactions of the Association for Computational Linguistics 10:639-658, 2022. link
[15] Maria Skeppstedt, Andreas Peldszus and Manfred Stede. More or less controlled elicitation of argumentative text: Enlarging a microtext corpus via crowdsourcing. In Proc. 5th Workshop in Argumentation Mining (at EMNLP), Brussels, 2018
[16] Ivan Namor, Pietro Totis, Samuele Garda, Manfred Stede: Mining Italian Short Argumentative Texts. In Proc. of the Italian Conference on Computational Linguistics (CLiC-it), Bari, 2019. link
[17] Irina Fishcheva, Evgeny Kotelnikov: Cross-Lingual Argumentation Mining for Russian Texts. In: van der Aalst W. et al. (eds) Analysis of Images, Social Networks and Texts. AIST 2019. Lecture Notes in Computer Science, vol 11832. Springer, Cham, 2019 link
[18] Mohammad Yeghaneh Abkenar and Manfred Stede: Neural mining of Persian short argumentative texts. In Proc. of the 2nd Workshop on Resources and Technologies for Indigenous, Endangered and Lesser-resourced Languages in Eurasia (LREC - EURALI). Turin, 2024 link