Social Media Research
The spread of various kinds of social media services in recent years prompted attention also in the Computation Linguistics community. User-generated data posted on the Web is taken to represent "unfiltered" users' thoughts and opinions, and a high-quality evaluation of this data can lead to insights into the processes that are underlying the modern society, ranging from economic analyses to prediction of trends of social development.
Unfortunately, existing automatic natural language processing tools -- most of which were created for "standard" (usually: newspaper) language texts -- do not easily adapt to the relatively peculiar genre of casual online conversations. Therefore, we are interested in more robust NLP modules that are able to process even such ``unconventional'' linguistic variants as the Web language.
In our lab, we try to solve some of these problems by investigating which linguistic phenomena actually account for the notorious noisiness of the online text genre, assessing which strategies (text normalization or domain adaptation) are more appropriate for dealing with this noisiness, and estimating how much the results of automatic text processing obtained on standard language texts differ from the results obtained on social media data. Most of our research centers on Twitter microblogs.
Discourse analysis of social media
A selection of posters related to working with Twitter data
- Tatjana Scheffler and Christopher CM Kyba. Measuring social jetlag in twitter data. In Tenth International AAAI Conference on Web and Social Media. 2016. URL: http://www.aaai.org/ocs/index.php/ICWSM/ICWSM16/paper/view/13080. [Bibtex] [PDF]
- Tatjana Scheffler and Elina Zarisheva. Dialog act recognition for Twitter conversations. In Proceedings of the Workshop on Normalisation and Analysis of Social Media Texts (NormSoMe), 31–38. Portorož, Slovenia, 2016. [Bibtex] [PDF]
- Elina Zarisheva and Tatjana Scheffler. Dialog act annotation for Twitter conversations. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 114–123. Portorož, Slovenia, 2015. [Bibtex] [PDF]
- Johannes Gontrum and Tatjana Scheffler. Text-based geolocation of German Tweets. In Proceedings of the NLP4CMC 2015 Workshop at GSCL. Duisburg-Essen, Germany, 2015. [Bibtex] [PDF]
- Tatjana Scheffler. A German Twitter snapshot. In N. Calzolari et al., editor, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). Reykjavik, Iceland, 2014. European Language Resources Association (ELRA). [Bibtex]
- Tatjana Scheffler, Johannes Gontrum, Matthias Wegel, and Steve Wendler. Mapping German tweets to geographic regions. In Proceedings of NLP4CMC workshop at the 12th KONVENS. 2014. [Bibtex]
- Uladzimir Sidarenka, Tatjana Scheffler, and Manfred Stede. Rule-based Normalization of German Twitter Messages. In Proceedings of the Conference of the German Society for Computational Linguistics (GSCL 2013). Darmstadt, Germany, september 2013. European Language Resources Association (ELRA). [Bibtex] [PDF]
- W. Sidorenko, J. Sonntag, M. Stede, N. Krüger, and S. Stieglitz. From newspaper to microblogging: what does it take to find opinions? In Proc. of 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media (WASSA), NAACL-HLT. Atlanta/GA, 2013. Association for Computational Linguistics. [Bibtex]