Uladzimir Sidarenka

My official name is Uladzimir Sidarenka but, inofficially, you can also call me Vladimir Sidorenko (or Wladimir Sidorenko if you are speaking German). I graduated from Minsk State Linguistic University in 2006. One year later, I also did my Master's in Computational Linguistics at the same institution. Starting from 2005 and until 2012, I was working as a computational linguist in a now defunct private company Invention Machine where my colleagues and I were developing advanced search engines for corporate clients. In 2012, I eventually decided to start doing my PhD before it was too late. For this purpose, I came to Potsdam where I'm now still pursuing my goal at the chair of Applied Computational Linguistics. Since September 2015, I work as a part-time programmer at Retresco GmbH.

Research Interests

My current research is mainly focused on natural language processing (NLP) of social media texts (mostly German Twitter). I investigate the impact of different text normalization techniques on various downstream NLP applications, in particular, opinion mining and discourse analysis, also looking whether the former of these two tasks could benefit from the latter. However, in general, I'm interested in everything that is related to programming, mathematics (especially linear algebra and probability theory), machine learning, and/or automata theory.

Data

The Potsdam Twitter Sentiment Corpus (PotTS): A collection of 8,000 German tweets manually annotated with fine-grained sentiment relations.

Software

CRFSuite-0.13: An updated version of Naoaki Okazaki's CRFSuite, which was extended with tree-structured and higher-order linear-chain and semi-Markov CRF models.
DiscourseSegmenter: A python package providing rule-based and machine learning discourse segmenters.
DiscourseSenser: A python package for sense disambiguation of discourse relations in PDTB-style discourse parsing (supplementary data: Google Word Embeddings, Pre-Trained Models, Package 0.0.3-rc.1).
OsloPots: A docker image of the shallow discourse parser created for the CoNLL 2016 Shared Task competition;
SentiLex: A collection of tools for generating sentiment lexicons from neural word embeddings, corpora, and lexical taxonomies;
Word2Vec: An enhanced version of the original word2vec code with possibility to train task-specific and hybrid word embeddings.

Publications

Uladzimir Sidarenka. PotTS at GermEval-2017 Task B: Document-Level Polarity Detection Using Hand-Crafted SVM and Deep Bidirectional LSTM Network . In Proceedings of the GSCL GermEval Shared Task on Aspect-based Sentiment in Social Media Customer Feedback. Berlin, Germany, 2017. URL: https://drive.google.com/file/d/0B0IJZ0wwnhHDc1ZpcU05Mnh2N0U/view. [Bibtex]
Uladzimir Sidarenka. PotTS: The Potsdam Twitter Sentiment Corpus. In Nicoletta Calzolari et al., editor, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2016). Portorož, Slovenia, may 2016. European Language Resources Association (ELRA). [Bibtex] [PDF]
S. Oepen, J. Read, T. Scheffler, U. Sidarenka, M. Stede, E. Velldal, and L. Øvrelid. OPT: Oslo–Potsdam–Teesside—Pipelining Rules, Rankers, and Classifier Ensembles for Shallow Discourse Parsing. In Proceedings of the CONLL 2016 Shared Task. Berlin, 2016. [Bibtex] [PDF]
Uladzimir Sidarenka and Manfred Stede. Generating Sentiment Lexicons for German Twitter. In Proceedings of the Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media (PEOPLES 2016). Osaka, Japan, december 2016. [Bibtex] [PDF]
Uladzimir Sidarenka. PotTS at SemEval-2016 Task 4: Sentiment Analysis of Twitter Using Character-level Convolutional Neural Networks. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 235–242. San Diego, California, June 2016. Association for Computational Linguistics. [Bibtex] [PDF]
Uladzimir Sidarenka, Matthias Bisping, and Manfred Stede. Applying Rhetorical Structure Theory to Twitter Conversations. In Proceedings of DiSpol 2015. Saarbrücken, Germany, October 2015. [Bibtex] [PDF]
Uladzimir Sidarenka, Andreas Peldszus, and Manfred Stede. Discourse Segmentation of German Texts. Journal for Language Technology and Computational Linguistics, 30(1):71–98, 2015. [Bibtex] [PDF]
Uladzimir Sidarenka, Tatjana Scheffler, and Manfred Stede. Rule-based Normalization of German Twitter Messages. In Proceedings of the Conference of the German Society for Computational Linguistics (GSCL 2013). Darmstadt, Germany, september 2013. European Language Resources Association (ELRA). [Bibtex] [PDF]