Coreference Resolution
High-quality coreference resolution plays an important role in discourse processing, since text understanding requires (i) identifying referring expressions that point to the same extra linguistic referent in a natural language discourse, and (ii) establishing links between anaphoric entities and their antecedents. These abilities are beneficial for basically any NLP task operating on the discourse level, such as information extraction, machine translation, question answering, or text summarization.
Variation in coreference strategies
Language in social media is characterised by more formal (written-like) or more informal (spoken-like) style in different contexts, and thus shows high variability. It is known that the expression of coreferential relations differs between spoken and written language, for example in the textual distance between referential expressions and their antecedents, and the type of expression (pronoun or full noun phrase, for example) that is used. We extended this research to include social media conversations from Twitter, showing that coreferential relations on Twitter have similarities to both spoken and written language, depending on the feature investigated. In the following, we have adapted a computational model for coreference resolution to better capture the idiosyncrasies of social media conversations.
We are also interested in the interplay of coherence and coreference relations on Twitter conversations. To support our research, we are developing a coreference- and coherence-annotated corpus of Twitter conversations.
Our language of interest is English.
Coreference annotation projection
Common technologies for automatic coreference resolution require either a language-specific rule set or large collections of manually annotated data, which is typically limited to newswire texts in major languages. This hinders the development of coreference resolvers for a large number of the so-called "low-resourced languages", for which no extensive liguistic resources are available.
In order to alleviate the problem of resource scarcity, we investigate methods for multilingual coreference resolution and experiment with annotation projection algorithms on multilingual data. To support our research, we are developing a parallel multi-genre coreference corpus (newswire texts, narratives, medical instructions). Furthermore, we are interested in exploring structural differences of referring expressions and coreference chains across languages.
To date, our languages of interest are English, German and Russian, but our goal is to keep our approach easily generalisable to other languages and datasets.
Resources
PoCoRes
PoCoRes is an implemented pronoun resolver for German that roughly follows the filters-and-preferences approach of Lappin/Leass (1994) but was adjusted to certain features of the German language. It will be made available later this year.
Related publications:
- Berfin Aktaş and Manfred Stede. Anaphoric distance in oral and written language: Experimental evidence. Discours, 2022. URL: https://journals.openedition.org/discours/12383. [Bibtex]
- Berfin Aktaş and Manfred Stede. Variation in Coreference Strategies across Genres and Production Media. In Proceedings of the 28th International Conference on Computational Linguistics. Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. [Bibtex] [PDF]
- Berfin Aktaş and Annalena Kohnert. TwiConv: A Coreference-annotated Corpus of Twitter Conversations. In Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC@COLING), 47–54. Barcelona, Spain, December 2020. Association for Computational Linguistics. [Bibtex] [PDF]
- Berfin Aktaş, Veronika Solopova, Annalena Kohnert, and Manfred Stede. Adapting Coreference Resolution to Twitter Conversations. In Findings of the Association for Computational Linguistics: EMNLP 2020. Online, 2020. Association for Computational Linguistics. [Bibtex] [DOI] [PDF]
- Berfin Aktaş, Tatjana Scheffler, and Manfred Stede. Coreference in English OntoNotes: Properties and Genre Differences. In Proceedings of the 22nd International Conference on Text, Speech and Dialogue. Ljubljana, Slovenia, 2019. URL: https://link.springer.com/chapter/10.1007/978-3-030-27947-9_15. [Bibtex]
- Berfin Aktaş, Tatjana Scheffler, and Manfred Stede. Anaphora resolution for twitter conversations: an exploratory study. In Proceedings of the Workshop on Computational Models of Reference, Anaphora, and Coreference, CRAC@HLT-NAACL 2018, 1–10. New Orleans, Louisiana, June 2018. Association for Computational Linguistics. (Some numbers in Table 4 are corrected from the version published in the CRAC 2018 WS Proceedings). [Bibtex] [PDF]
- Massimo Poesio, Yulia Grishina, Varada Kolhatkar, Nafise Moosavi, Ina Roesiger, Adam Roussel, Fabian Simonjetz, Alexandra Uma, Olga Uryupina, Juntao Yu, and Heike Zinsmeister. Anaphora resolution with the ARRAU corpus. In Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference. New Orleans, USA, June 2018. [Bibtex] [PDF]
- Yulia Grishina. CORBON 2017 Shared Task: projection-based coreference resolution. In Proceedings of the 2nd Coreference Resolution Beyond OntoNotes (CORBON) Workshop. Valencia, Spain, April 2017. Association for Computational Linguistics. [Bibtex] [PDF]
- Yulia Grishina. Combining the output of two coreference resolution systems for two source languages to improve annotation projection. In Proceedings of the 3rd Workshop on Discourse in Machine Translation. Copenhagen, Denmark, September 2017. Association for Computational Linguistics. [Bibtex] [PDF]
- Yulia Grishina and Manfred Stede. Multi-source projection of coreference chains: assessing strategies and testing opportunities. In Proceedings of the 2nd Coreference Resolution Beyond OntoNotes (CORBON) Workshop. Valencia, Spain, April 2017. Association for Computational Linguistics. [Bibtex] [PDF]
- Yulia Grishina and Manfred Stede. Referring expressions as cohesive devices in multiple languages. In Proceedings of TextLink–Structuring Discourse in Multilingual Europe Second Action Conference, 55. Karoli Gaspar University of the Reformed Church, Budapest, Hungary, April 2016. [Bibtex] [PDF]
- Yulia Grishina. Experiments on bridging across languages and genres. In Proceedings of the Coreference Resolution Beyond OntoNotes (CORBON) Workshop. San Diego, California, June 2016. Association for Computational Linguistics. [Bibtex] [PDF]
- Yulia Grishina and Manfred Stede. Parallel coreference annotation guidelines. November 2016. [Bibtex] [PDF]
- Manfred Stede and Yulia Grishina. Anaphoricity in connectives: a case study on German. In Proceedings of the Coreference Resolution Beyond OntoNotes (CORBON) Workshop. San Diego, California, June 2016. Association for Computational Linguistics. [Bibtex] [PDF]
- Yulia Grishina and Manfred Stede. Knowledge-lean projection of coreference chains across languages. In Proceedings of the 8th Workshop on Building and Using Comparable Corpora. Beijing, China, July 2015. Association for Computational Linguistics. [Bibtex] [PDF]
- Fatemeh Torabi Asr, Jonathan Sonntag, Yulia Grishina, and Manfred Stede. Conceptual and practical steps in event coreference analysis of large-scale data. In Proceedings of the Second Workshop on EVENTS: Definition, Detection, Coreference, and Representation, 35–44. Baltimore, Maryland, USA, June 2014. Association for Computational Linguistics. [Bibtex] [PDF]
- Christian Chiarcos, Julia Ritz, and Manfred Stede. Querying and visualizing coreference annotation in multi-layer corpora. In Proceedings of the Eigth Discourse Anaphora and Anaphor Resolution Colloquium (DAARC). Faro, Portugal, 2011. [Bibtex]