Multimodal Dialogue Systems (Fall 2020)

Block course

Time & Location: kick-off meeting in MS Teams; presentation meetings indicatively 2-3 last weeks of September or 2-3 first weeks of October

Teacher: Dr Volha Petukhova

*** Announcements***

Kick-off: 14.05.2020 at 14:00-15:00

Kick-off meeting in MS Teams

Kick-off slides: PDF

Introduction slides: PDF

Suitable for: CoLi, CS and CuK


We plan to hold a first planing meeting early in the semester. For the actual seminar (doodle decision on time and papers) we will have a talk for each participant of 40-45 minutes followed by 15 minutes discussion (discussions participation will be also graded) . After the talk, the presenter has to prepare a short about 10 pages report and hand it in for grading. 

Grading: 40% based on the talk, 40% based on the report, 20% based on discussions participation.

Term paper:

  • LaTeX template for term papers (zip)
  • 11-point checklist for term papers (pdf)


Situated interaction;
Understanding and generation of multimodal human dialogue behavior;
Social signals/affective computing;
Multimodal dialogue modelling;
Multimodal dialogue systems & applications

  *Each talk will be based on a research paper

Cognition: cognitive states, affective states and cognitive agents

1.  Barsalou, Lawrence W. “Situated conceptualization: theory and application.” Perceptual and Emotional Embodiment: Foundations of Embodied Cognition. Psychology Press: East Sussex (2015) PDF

2. D’Mello, S., Jackson, T., Craig, S., Morgan, B., Chipman, P., White, H., … & Graesser, A. (2008, June). AutoTutor detects and responds to learners affective and cognitive states. In Workshop on emotional and cognitive issues at the international conference on intelligent tutoring systems (pp. 306-308).

3. Zhang, T., Hasegawa-Johnson, M., & Levinson, S. E. (2006). Cognitive state classification in a spoken tutorial dialogue system. Speech communication, 48(6), 616-632.

4. Akira, H., Haider, F., Cerrato, L., Campbell, N., & Luz, S. (2015). Detection of cognitive states and their correlation to speech recognition performance in speech-to-speech machine translation systems. In Sixteenth Annual Conference of the International Speech Communication Association.

Multimodality: multimodal expressions, annotations and tools

5. Luz, S. (2012). The nonverbal structure of patient case discussions in multidisciplinary medical team meetings. ACM Transactions on Information Systems (TOIS), 30(3), 1-24.

6. Sriramulu, A., Lin, J., & Oviatt, S. (2019, October). Dynamic Adaptive Gesturing Predicts Domain Expertise in Mathematics. In 2019 International Conference on Multimodal Interaction (pp. 105-113).

7. Del Piccolo, L., De Haes, H., Heaven, C., Jansen, J., Verheul, W., Bensing, J., … & Goss, C. (2011). Development of the Verona coding definitions of emotional sequences to code health providers’ responses (VR-CoDES-P) to patient cues and concerns. Patient education and counseling, 82(2), 149-155.

8. Eyben, F., Wöllmer, M., & Schuller, B. (2010, October). Opensmile: the munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM international conference on Multimedia (pp. 1459-1462).

9. Baltrušaitis, T., Robinson, P., & Morency, L. P. (2016, March). Openface: an open source facial behavior analysis toolkit. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1-10). IEEE.

Multimodal fusion, dialogue modelling and management

10. Jaiswal, S., Valstar, M., Kusumam, K., & Greenhalgh, C. (2019, July). Virtual Human Questionnaire for Analysis of Depression, Anxiety and Personality. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents (pp. 81-87).

11. Hirano, Y., Okada, S., Nishimoto, H., & Komatani, K. (2019, October). Multitask Prediction of Exchange-level Annotations for Multimodal Dialogue Systems. In 2019 International Conference on Multimodal Interaction (pp. 85-94).

12. Rudovic, O., Zhang, M., Schuller, B., & Picard, R. (2019, October). Multi-modal Active Learning From Human Data: A Deep Reinforcement Learning Approach. In 2019 International Conference on Multimodal Interaction (pp. 6-15).

Multimodal dialogue systems & applications

13. Soleymani, M., Stefanov, K., Kang, S. H., Ondras, J., & Gratch, J. (2019, October). Multimodal Analysis and Estimation of Intimate Self-Disclosure. In 2019 International Conference on Multimodal Interaction (pp. 59-68).

14. Zalake, M., Tavassoli, F., Griffin, L., Krieger, J., & Lok, B. (2019, July). Internet-based Tailored Virtual Human Health Intervention to Promote Colorectal Cancer Screening: Design Guidelines from Two User Studies. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents (pp. 73-80).

15. Tavabi, L., Stefanov, K., Nasihati Gilani, S., Traum, D., & Soleymani, M. (2019, October). Multimodal Learning for Identifying Opportunities for Empathetic Responses. In 2019 International Conference on Multimodal Interaction, pp. 95-104

16. Hoegen, R., Aneja, D., McDuff, D., & Czerwinski, M. (2019, July). An end-to-end conversational style matching agent. In Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents, pp. 111-118.

17. Ahuja, C., Ma, S., Morency, L. P., & Sheikh, Y. (2019, October). To React or not to React: End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations. In 2019 International Conference on Multimodal Interaction (pp. 74-84).

For any questions, please send an email to:

Use subject tag: [MDS_2020]