Grzegorz Chrupała
I have moved to Tilburg University. This page is no longer maintained. See chrupala.me for up-to-date information.
I am a postdoctoral researcher at Spoken Language Systems at
Saarland University. My research is focused on computational language learning
and its applications in language understanding and text analytics. My interests
include:
- Named Entity and Relation extraction
- Discovery of latent stucture in language, especially word class induction
- Incremental (online) learning
I received my PhD from the School of Computing at Dublin City University for work on data-driven morphological and deep syntactic analysis.
I was involved in the NL-Search project: Naturally speaking search engine (German)
I am currently working within the EMERGENT project.
News
Publications
My Google scholar citation statistics
- Afra Alishahi and Grzegorz Chrupała. 2012. Concurrent
Acquisition of Word Meaning and Lexical Categories. EMNLP-CoNLL 2012.
Paper | Poster - Grzegorz Chrupała. 2012. Hierarchical clustering of word
class distributions. NAACL-HLT 2012 Workshop on the Induction of Linguistic
Structure.
Paper -
Grzegorz Chrupała. 2012. Learning from evolving data
streams: online triage of bug reports. EACL 2012.
Paper | Slides | Data -
Grzegorz Chrupała. 2011. Efficient induction of
probabilistic word classes with LDA. IJCNLP 2011.
Paper | Slides | Code -
Grzegorz Chrupała, Saeedeh
Momtazi, Michael Wiegand, Stefan Kazalski, Fang Xu, Benjamin Roth,
Alexandra Balahur and Dietrich Klakow. 2010. Saarland University
Spoken Language Systems at the Slot Filling Task of TAC KBP 2010. TAC
2010.
Paper - Grzegorz Chrupała, Georgiana Dinu and Benjamin
Roth. 2010. Enriched syntax-based meaning representation for answer
extraction. SIGIR
2010 Workshop: Query Representation and Understanding
Paper | Poster - Grzegorz Chrupała and Afra
Alishahi. 2010. Online Entropy-based Model of Lexical Category
Acquisition. CoNLL
2010
Paper | Slides | Code - Georgiana
Dinu and Grzegorz Chrupała. 2010. Relatedness
curves for acquiring paraphrases. ACL workshop GEMS 2010
Paper - Djamé Seddah,
Grzegorz Chrupała, Özlem Çetinoğlu,
Josef van Genabith
and Marie
Candito. 2010. Lemmatization and Lexicalized Statistical Parsing
of Morphologically Rich Languages: the Case of French. NAACL SPMRL 2010
workshop
Paper. - Grzegorz Chrupała and Dietrich Klakow. 2010. A
Named Entity Labeler for German: exploiting Wikipedia and
distributional clusters. LREC 2010
Paper | Code: SemiNER - Afra
Alishahi and Grzegorz
Chrupała. 2009. Lexical
Category Acquisition as an Incremental
Process. PsychoCompLA-2009,
Cogsci
2009
Paper - Michael Wiegand, Saeedeh
Momtazi, Stefan Kazalski, Fang Xu, Grzegorz
Chrupała and Dietrich Klakow. 2008. The Alyssa System at TAC
QA 2008. TAC 2008
Paper - Grzegorz Chrupała, Georgiana Dinu and Josef van
Genabith. 2008. Learning Morphology with Morfette. LREC 2008
Paper | Code: Morfette - Grzegorz Chrupała, Josef van
Genabith. Using very large corpora to detect raising and control
verbs. 2007. LFG07
Paper - Grzegorz Chrupała, Nicolas Stroppa, Josef van Genabith and
Georgiana
Dinu. 2007. Better Training for Function Labeling. RANLP 2007
Paper - Grzegorz Chrupała. 2006. Simple Data-Driven
Context-Sensitive Lemmatization. SEPLN 2006
Paper - Grzegorz Chrupała and Josef van
Genabith. 2006. Using Machine-Learning to Assign Function Labels
to Parser Output for Spanish. COLING/ACL 2006
Paper - Grzegorz Chrupała and Josef van
Genabith. 2006. Improving Treebank-Based Automatic LFG Induction
for Spanish. LFG06
Paper - Xavier Carreras,
Lluís Màrquez and
Grzegorz Chrupała. 2004. Hierarchical Recognition of
Propositional Arguments with Perceptrons. CoNLL-2004
Paper - Anthony Pym and Grzegorz Chrupała. 2005. The quantitative analysis of translation flows in the age of an international language. In Less Translated Languages, Albert Branchadell and Lovell Margaret West (eds.), 27-38. John Benjamins.
-
Grzegorz Chrupała. 2003. Perl Scripting in
Translation Project Management. In Across Languages and
Cultures, Vol. 4, No. 1. (5 May 2003), pp. 109-132
Paper - Grzegorz Chrupała and Lidia Cámara. 2003. STAR Transit XV. In Entornos Informáticos de la Traducción Profesional, Gloria Corpas Pastor and María-José Varela Salinas, (eds.). Atrio, Granada.
Theses
- Grzegorz Chrupała. 2008. Towards a
Machine-Learning Architecture for Lexical Functional Grammar
Parsing. PhD dissertation, Dublin City University
PDF | Single-spaced PDF - Grzegorz Chrupała. 2003. Acquiring Verb Subcategorization
from Spanish Corpora. DEA Thesis, University of Barcelona.
PDF - Grzegorz Chrupała. 1998. Bibliotheca in Fabula. The library motive in La biblioteca de Babel, The British Museum is Falling Down and Il nome della rosa. MA Thesis, University of Silesia.
HTML
Software
- Colada: implements online and minibatch word class class induction using Latent Dirichlet Allocation (LDA) with an Online Gibbs sampler.
- LDA-wordclass: Soft word-class induction with Latent Dirichlet Allocation
- Lingo: Haskell NLP utilities
- Delta-H: Online entropy-based model of lexical category acquisition
- Sequor: a perceptron-based sequence labeler with a flexible feature template language.
It is meant mainly for NLP applications such as Part of Speech tagging, syntactic chunking or Named Entity labeling. Includes:
- SemiNER: a semi-supervised Named Entity labeler (with pre-trained models for German)
- Morfette: a tool for supervised learning of inflectional morphology. Comes with pre-trained models for Spanish and French.
Teaching
- Pattern and Speech Recognition, winter semester 2011/2012
- Preparatory course in Statistics (with Francesca Delogu)
- Statistical Natural Language Processing, summer semester 2011
- Pattern and Speech Recognition, winter semester 2010/2011
- META tutorial on Machine Learning for NLP (with Nicolas Stroppa), Barcelona Media, Oct 19-20 2010
- Statistical Natural Language Processing, summer semester 2010
- Pattern and Speech recognition, winter semester 2009/2010
- Introduction to classification and sequence labeling at IRTG Annual Meeting 2009. Slides
- Two-day tutorial on Machine Learning at Dublin City University, 18-19 March 2009. Slides
- Statistical natural language processing, summer semester 2009
- Pattern and Speech recognition, winter semester 2008/9
- Statistical natural language processing, summer semester 2008
Contact
Saarland University
FR 7.4 Spoken Language Systems
Building C7 1, Room 0.04
66041 Saarbrücken
+49 681 302 58126
gchrupala@lsv.uni-saarland.de