Our research projects and some student's term projects have lead to several NLP resources that we are making available.
- The Potsdam Commentary Corpus (PCC): A corpus of multi-level annotated German newspaper commentaries
- arg-microtexts: A German English parallel corpus of 112 short argumentative texts annotated with argumentation structures
- The Potsdam Twitter Sentiment Corpus (PotTS): A collection of 8,000 German tweets manually annotated with fine-grained sentiment relations
Grammars and Lexica
- ANNIS3: An open-source linguistic database and query tool for multi-layer-annotated corpora (developed in the SFB D1 project with Anke Lüdeling's group at HU Berlin)
- ConAno: A Java tool for semi-automatically annotating connectives and their arguments
- GraPAT: A graph-based, web-based annotation tool suited for sentiment and argumentation structure annotation
- discoursegraphs: A converter and merging library for syntactic and discourse-related annotation formats (Tiger, PTB, RSTTool, MMAX, Conano, EXMARaLDA) with output support for generic graph formats (neo4j, dot, GEXF, GML, GraphML)
- CRFSuite-0.13: An updated version of Naoaki Okazaki's CRFSuite that was extended with tree-structured and higher-order linear-chain and semi-Markov CRF models
- DiscourseSegmenter: A python package providing rule-based and machine learning discourse segmenters
- DiscourseSenser: A python package for sense disambiguation of discourse relations in PDTB-style discourse parsing;
- OsloPots: A docker image of the shallow discourse parser created for the CoNLL 2016 Shared Task competition;
- SentiLex: A collection of tools for generating sentiment lexicons from neural word embeddings, corpora, and lexical taxonomies.