Proseminar: Natural Language Processing and the Web
Summer Semester 2017
Instructor: Michael Wiegand
Location: U15, Building C7.1
Time: Thursdays, 14-16
Begin: April 20th, 2017
Suitable for: B.Sc.
Please remember that from now onwards, we will again meet in U15.
In this course, we will address what impact the Web has on Natural Language Processing (NLP).
Compared to the existing text corpora which have been used in the past, the Web is much larger and if considered as a corpus, it can be used to extract phenomena which are too sparsely represented in traditional corpora.
Some specific sites, such as Wikipedia, represent useful knowledge bases that can also be harnessed for NLP applications.
However, since the language of the Web, particularly the social media, differs quite dramatically from conventional text corpora employed in NLP, software tools also have to be adapted.
Finally, the Web also yields some problematic issues, such as hate speech or fake reviews, whose detection can be solved with the help of NLP.
Most papers that are going to be presented by the students in this proseminar will have a linguistic focus. Some basic understanding of machine learning (in the scope of "Mathematische Grundlagen III") would be helpful.
|27.04.2017||MW||Recap on machine learning and evaluation||--||--|
|04.05.2017||MW||How to present a paper||--||--|
|18.05.2017||Stefan Gruenewald||Crowdsourcing||Amazon Mechanical Turk: Gold Mine or Coal Mine? (presentation)||Valentin Kany|
|01.06.2017||David Meier||Distant Supervision||Using Wikipedia for Automatic Word Sense Disambiguation (presentation)||Jana Jungbluth|
|08.06.2017||Jana Jungbluth||Deception Detection||Finding Deceptive Opinion Spam by Any Stretch of the Imagination (presentation)||Stefan Gruenewald|
|22.06.2017||Valentin Kany||Hate Speech||Abusive Language Detection in Online User Content (presentation)||David Meier|
|29.06.2017||MW||How to write a term paper||reference document (presentation)||--|
Papers to be Discussed
- Jacob Eisenstein: What to do about bad language on the internet, in Proceedings of NAACL, 2013.
- Kevin Gimpel, Nathan Schneider, Brendan O'Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilmann, Dani Yogatana, Jeffrey Flanigan and Noah A. Smith: Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments, in Proceedings of ACL, 2011.
- Jennifer Foster: "cba to check the spelling" Investigating Parser Performance on Discussion Forum Posts, in Proceedings of NAACL, 2010.
- Matthew Purver and Stuart Battersby: Experimenting with Distant Supervision for Emotion Classification, in Proceedings of EACL, 2012.
- Rada Mihalcea: Using Wikipedia for Automatic Word Sense Disambiguation, in Proceedings of NAACL, 2007.
- Keji Shinzato and Kentaro Torisawa: Acquiring Hyponymy Relations from Web Documents, in Proceedings of NAACL, 2004.
- Peter D. Turney: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, in Proceedings of ACL, 2002.
- Adam Kilgarriff: Googleology is Bad Science, Computational Linguistics, 2007.
- Carmen Banea, Rada Mihalcea, Janyce Wiebe, and Samer Hassan: Multilingual Subjectivity Analysis Using Machine Translation, in Proceedings of EMNLP, 2008.
- Rion Snow, Brendan O'Connor, Daniel Jurafsky, Andrew Y. Ng: Cheap and Fast — But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks, in Proceedings of EMNLP, 2008
- Karën Fort, Gilles Adda, and K. Bretonnel Cohen: Amazon Mechanical Turk: Gold Mine or Coal Mine, Computational Linguistics, Vol. 37 (2), 2011.
- Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T. Hancock: Finding Deceptive Opinion Spam by Any Stretch of the Imagination, in Proceedings of ACL, 2011.
- Ellen Spertus: Smokey: Automatic recognition of Hostile Messages, in Proceedings of IAAI, 1997.
- Chikashi Nobata, Joel R. Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang: Abusive Language Detection in Online User Content, in Proceedings of WWW, 2016.
Requirements for Attendance
- The language of the course is German
- Students should have passed the courses: Einführung in die Computerlinguistik, Mathematische Grundlagen III
Requirements for Passing the Course
- oral presentation
- term paper (Hausarbeit)
- reviewing a paper presented by another student
Last update: 2017/06/29