Coreference Resolution

High-quality coreference resolution plays an important role in discourse processing, since text understanding requires (i) identifying referring expressions that point to the same extra linguistic referent in a natural language discourse, and (ii) establishing links between anaphoric entities and their antecedents. These abilities are beneficial for basically any NLP task operating on the discourse level, such as information extraction, machine translation, question answering, or text summarization.

Coreference annotation projection

Common technologies for automatic coreference resolution require either a language-specific rule set or large collections of manually annotated data, which is typically limited to newswire texts in major languages. This hinders the development of coreference resolvers for a large number of the so-called "low-resourced languages", for which no extensive liguistic resources are available.

In order to alleviate the problem of resource scarcity, we investigate methods for multilingual coreference resolution and experiment with annotation projection algorithms on multilingual data. To support our research, we are developing a parallel multi-genre coreference corpus (newswire texts, narratives, medical instructions). Furthermore, we are interested in exploring structural differences of referring expressions and coreference chains across languages.

To date, our languages of interest are English, German and Russian, but our goal is to keep our approach easily generalisable to other languages and datasets.

Resources

PoCoRes

PoCoRes is an implemented pronoun resolver for German that roughly follows the filters-and-preferences approach of Lappin/Leass (1994) but was adjusted to certain features of the German language. It will be made available later this year.

Related publications: