Social Media Research

The spread of various kinds of social media services in recent years prompted attention also in the Computation Linguistics community. User-generated data posted on the Web is taken to represent "unfiltered" users' thoughts and opinions, and a high-quality evaluation of this data can lead to insights into the processes that are underlying the modern society, ranging from economic analyses to prediction of trends of social development.

Unfortunately, existing automatic natural language processing tools -- most of which were created for "standard" (usually: newspaper) language texts -- do not easily adapt to the relatively peculiar genre of casual online conversations. Therefore, we are interested in more robust NLP modules that are able to process even such ``unconventional'' linguistic variants as the Web language.

In our lab, we try to solve some of these problems by investigating which linguistic phenomena actually account for the notorious noisiness of the online text genre, assessing which strategies (text normalization or domain adaptation) are more appropriate for dealing with this noisiness, and estimating how much the results of automatic text processing obtained on standard language texts differ from the results obtained on social media data. Most of our research centers on Twitter microblogs.

Related Projects

Discourse analysis of social media

Related Resources

A selection of posters related to working with Twitter data

Related publications: