Probing the discourses of climate change (CC)
What can automatic text mining reveal about CC communication?
In a series of interrelated student projects, spanning across several seminars and research modules since 2020, we apply text mining methods to a variety of text corpora (some of which we built ourselves). The overarching goal is to identify patterns, opinions and arguments brought forward in different CC discourses.
A video that introduces five of our recent student projects (largely in German), targeting the general public, is available at the online "Potsdamer Tag der Wissenschaften" 2021 here.
- GerCCT: 12.000 pairs of German Climate Change Tweets, collected at DRL (see Schäfer/Stede 2020 below)
- NatSciCC: 490 editorials from Nature and Science (1966-2015), manually annotated by Hulme et al. 2018 for thematic framing categories. We built a digital version of the corpus.
- NYTAC: New York Times Annotated Corpus. We identified 10.000 articles related to CC.
- CMV-CC: A CC subset of the "Change My View" subreddit corpus compiled by Webis.
- Glossary: We built a linguistically-oriented online glossary of 250 German climate compound nouns used in politically-oriented discourse
- Framing in NatSciCC: Following up on the work of Hulme et al. 2018, we analyze the linguistics of framing in editorials.
- Classifying the NatSciCC texts: Focusing on the problem of imbalanced data, we aim at automatically reconstructing the topic frame annotations by Hulme et al. 2018 (see Bracke 2020 below).
- Tracking CC in NYTAC: We use unsupervised methods to detect patterns in CC reporting in The New York Times (1987-2007).
- Argumentation in Twitter exchanges: We study various subtaks of argumentation mining on the GerCCT corpus.
- News headlines: We searched the archives of German newspapers for climate change articles, collected their headlines and study trends of term usage
BSc students 2022 (Computational Linguistics): Noël Simmel
MSc students 2022 (Cognitive Systems): Raunak Agarwal, Luka Borec, Anna Goecke, Juliane Hanel, Neele Charlotte Kinkel, Nailia Mirzakhmedova
PhD student: Robin Schäfer
Inter/national collaborators: Nic Badullovich (ANU Climate Change Institute, Canberra), Ronny Patz (Hertie School, Berlin), Patrick Saint-Dizier (Univ. Paul Sabatier, Toulouse), Maria Skeppstedt (Inst. for Language and Folklore, Sweden)
- Manfred Stede (email@example.com)
- Robin Schaefer and Manfred Stede. GerCCT: An Annotated Corpus for Mining Arguments in German Tweets on Climate Change. In Proceedings of the 13th Language Resources and Evaluation Conference (LREC). 2022. To appear. [Bibtex]
- M. Stede and R. Patz. The climate change debate and natural language processing. In Proc. of the 1st Workshop on NLP for positive impact (ACL). Online, 2021. [Bibtex] [PDF]
- Robin Schäfer and Manfred Stede. Annotation and detection of arguments in tweets. In Proceedings of the 7th Workshop on Argument Mining, 53–58. Online, December 2020. Association for Computational Linguistics. [Bibtex] [PDF]
- Yannic Bracke. Automatic text classification with imbalanced data: Building a frame classifier from a corpus of editorials. Unpublished B.Sc. Thesis, 2020. [Bibtex]