CRC 1102 Information Density and Linguistic Encoding

LSV is involved in 3 projects as part of the SFB/CRC 1102:

Language Comprehension and Cognitive Control Demands: Adapting Information Density to Changing Situations and Individual Users

Project A4 investigates the hypothesis that the channel capacity of comprehenders – and thus their behaviour in response to high information density utterances – may be modulated by other immediate cognitive demands (such as driving), as well as individual differences. This will be accomplished by contrasting the comprehension of high and low density utterances in single and dual-task settings, as well as with different populations (older and younger adults). Experimental findings will further contribute to the development of a language generation model that adapts linguistic encodings appropriately based on both the immediate setting and cognitive capacity of the listener.

Modeling and Measuring Information Density

Project B4 is concerned with computational modeling of information density in terms of language models. The main goal of the project is to improve current language modeling approaches by developing a more sophisticated notion of context. While existing models stay within the sentence boundary, this project extends the notion of context to much larger stretches of text: The proposed approach further allows for a gradual development and forgetting of the context as the text evolves. As a secondary goal, project B4 will provide a tool box of standard language modeling techniques. Both the new models and the tool box will be exploited in many other CRC projects, to obtain measures of information density.

Mutual Intelligibility and Surprisal in Slavic Intercomprehension (INCOMSLAV)

Project C4 is concerned with the differential encodings of linguistic categories in a cross-linguistic perspective (here: Slavic languages) focusing on density. In particular, the project will investigate the relation of grammaticalisation, encoding density and information density. As a relevant application, intercomprehension within the family of Slavic languages will be explored. The project will bring together results from the analysis of parallel corpora and from a variety of experiments with native speakers of Slavic languages and will compare them with insights of comparative historical linguistics on the relationship between Slavic languages. A statistical language model is used as a measure of surprisal and as a tool to gauge how language users master high degrees of surprisal, due to partial incomprehensibility. The key idea here is that comprehension of an unknown, but related, language should be better, when the language model adapted for understanding the unknown language exhibits relatively low average surprisal, or density.

Learn more about the CRC Information Density and Linguistic Encoding here.