Coreference Resolution

High-quality coreference resolution plays an important role in discourse processing, since text understanding requires (i) identifying referring expressions that point to the same extra linguistic referent in a natural language discourse, and (ii) establishing links between anaphoric entities and their antecedents. These abilities are beneficial for basically any NLP task operating on the discourse level, such as information extraction, machine translation, question answering, or text summarization.

Variation in coreference strategies

Language in social media is characterised by more formal (written-like) or more informal (spoken-like) style in different contexts, and thus shows high variability. It is known that the expression of coreferential relations differs between spoken and written language, for example in the textual distance between referential expressions and their antecedents, and the type of expression (pronoun or full noun phrase, for example) that is used. We extended this research to include social media conversations from Twitter, showing that coreferential relations on Twitter have similarities to both spoken and written language, depending on the feature investigated. In the following, we have adapted a computational model for coreference resolution to better capture the idiosyncrasies of social media conversations.

We are also interested in the interplay of coherence and coreference relations on Twitter conversations. To support our research, we are developing a coreference- and coherence-annotated corpus of Twitter conversations.

Our language of interest is English.

Coreference annotation projection

Common technologies for automatic coreference resolution require either a language-specific rule set or large collections of manually annotated data, which is typically limited to newswire texts in major languages. This hinders the development of coreference resolvers for a large number of the so-called "low-resourced languages", for which no extensive liguistic resources are available.

In order to alleviate the problem of resource scarcity, we investigate methods for multilingual coreference resolution and experiment with annotation projection algorithms on multilingual data. To support our research, we are developing a parallel multi-genre coreference corpus (newswire texts, narratives, medical instructions). Furthermore, we are interested in exploring structural differences of referring expressions and coreference chains across languages.

To date, our languages of interest are English, German and Russian, but our goal is to keep our approach easily generalisable to other languages and datasets.

Resources

PoCoRes

PoCoRes is an implemented pronoun resolver for German that roughly follows the filters-and-preferences approach of Lappin/Leass (1994) but was adjusted to certain features of the German language. It will be made available later this year.

Related publications: