discoursegraphs (Neumann 2015) is a graph-based converter and merging tool for multi-level annotated corpora. The library enables you to process linguistic corpora with multiple levels of annotations by:
- converting the different annotation formats into separate graphs and
- merging these graphs into a single multidigraph (based on the common tokenization of the annotation layers)
- exporting your (merged) graphs into several output formats
- visualizing linguistic graphs directly in an IPython notebook (using GraphViz)
discoursegraphs includes importers for the following tools and formats:
- constituent and dependency structures: Tiger-XML, Penn Treebank and CoNLL 2009/2010
- rhetorical structure: RSTTool's rs3 and rst/dis formats
- pointing relations (e.g. coreference, connectives): MMAX2 and ConAno
- annotations of spans of text: EXMARaLDA
The library also provides a number of exporters for
- general purpose graph formats like dot, GEFX, GML and GraphML
- the linguistic interchange formats CoNLL 2009 and PAULA XML 1.1
- the neo4j graph database (both regular export via the geoff format, as well as live upload of annotated graphs to a running neo4j instance
- EXMARaLDA's exb format.
Further information and download options can be found here.
Credits: discoursegraphs was implemented by Arne Neumann.