Social Media Research

The spread of various kinds of social media services in recent years prompted attention also in the Computation Linguistics community. User-generated data posted on the Web is taken to represent "unfiltered" users' thoughts and opinions, and a high-quality evaluation of this data can lead to insights into the processes that are underlying the modern society, ranging from economic analyses to prediction of trends of social development.

Unfortunately, existing automatic natural language processing tools -- most of which were created for "standard" (usually: newspaper) language texts -- do not easily adapt to the relatively peculiar genre of casual online conversations. Therefore, we are interested in more robust NLP modules that are able to process even such ``unconventional'' linguistic variants as the Web language.

In our lab, we try to solve some of these problems by investigating which linguistic phenomena actually account for the notorious noisiness of the online text genre, assessing which strategies (text normalization or domain adaptation) are more appropriate for dealing with this noisiness, and estimating how much the results of automatic text processing obtained on standard language texts differ from the results obtained on social media data. Most of our research centers on Twitter microblogs.

Related Projects

Discourse analysis of social media

Related Resources

A selection of posters related to working with Twitter data

Related publications:

Robin Schäfer and Manfred Stede. Annotation and detection of arguments in tweets. In Proceedings of the 7th Workshop on Argument Mining, 53–58. Online, December 2020. Association for Computational Linguistics. [Bibtex] [PDF]
Tatjana Scheffler, Berfin Aktaş, Debopam Das, and Manfred Stede. Annotating Shallow Discourse Relations in Twitter Conversations. In Proc. of the Workshop on Discourse Relation Parsing and Treebanking at NAACL. Minneapolis, MN, 2019. [Bibtex] [PDF]
Robin Schäfer and Manfred Stede. Improving implicit stance classification in tweets using word and sentence embeddings. In Christoph Benzmüllerand Heiner Stuckenschmidt, editor, KI 2019: Advances in Artificial Intelligence, 299–307. Cham, 2019. Springer International Publishing. URL: https://link.springer.com/chapter/10.1007/978-3-030-30179-8_26. [Bibtex]
Berfin Aktaş, Tatjana Scheffler, and Manfred Stede. Anaphora resolution for twitter conversations: an exploratory study. In Proceedings of the Workshop on Computational Models of Reference, Anaphora, and Coreference, CRAC@HLT-NAACL 2018, 1–10. New Orleans, Louisiana, June 2018. Association for Computational Linguistics. (Some numbers in Table 4 are corrected from the version published in the CRAC 2018 WS Proceedings). [Bibtex] [PDF]
W. Sidorenko and M. Stede. Potsdam Tweet Annotation Guidelines: Rhetorical Structure. Unpublished manuscript, 2017. [Bibtex] [PDF]
Tatjana Scheffler and Manfred Stede. Realizing argumentative coherence relations in German: a contrastive study of newspaper editorials and Twitter posts. In Proceedings of the COMMA Workshop "Foundations of the Language of Argumentation". Potsdam, Germany, 2016. [Bibtex] [PDF]
Tatjana Scheffler and Christopher CM Kyba. Measuring social jetlag in twitter data. In Tenth International AAAI Conference on Web and Social Media. 2016. URL: http://www.aaai.org/ocs/index.php/ICWSM/ICWSM16/paper/view/13080. [Bibtex] [PDF]
Tatjana Scheffler and Elina Zarisheva. Dialog act recognition for Twitter conversations. In Proceedings of the Workshop on Normalisation and Analysis of Social Media Texts (NormSoMe), 31–38. Portorož, Slovenia, 2016. [Bibtex] [PDF]
Elina Zarisheva and Tatjana Scheffler. Dialog act annotation for Twitter conversations. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 114–123. Portorož, Slovenia, 2015. [Bibtex] [PDF]
Johannes Gontrum and Tatjana Scheffler. Text-based geolocation of German Tweets. In Proceedings of the NLP4CMC 2015 Workshop at GSCL. Duisburg-Essen, Germany, 2015. [Bibtex] [PDF]
Tatjana Scheffler, Johannes Gontrum, Matthias Wegel, and Steve Wendler. Mapping German tweets to geographic regions. In Proceedings of NLP4CMC workshop at the 12th KONVENS. 2014. [Bibtex]
Tatjana Scheffler. A German Twitter snapshot. In N. Calzolari et al., editor, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). Reykjavik, Iceland, 2014. European Language Resources Association (ELRA). [Bibtex]
Uladzimir Sidarenka, Tatjana Scheffler, and Manfred Stede. Rule-based Normalization of German Twitter Messages. In Proceedings of the Conference of the German Society for Computational Linguistics (GSCL 2013). Darmstadt, Germany, september 2013. European Language Resources Association (ELRA). [Bibtex] [PDF]
W. Sidorenko, J. Sonntag, M. Stede, N. Krüger, and S. Stieglitz. From newspaper to microblogging: what does it take to find opinions? In Proc. of 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media (WASSA), NAACL-HLT. Atlanta/GA, 2013. Association for Computational Linguistics. [Bibtex]