Multimodal Dialogue Systems (Fall 2026) – Spoken Language Systems

Block course

Time & Location: kick-off meeting in April-May; presentation meetings indicatively 2-3 last weeks of September or 2-3 first weeks of October

Teacher: Dr Volha Petukhova

*** Announcements***

Registration in LSF

Join TEAMS

Kick-off: TBA

Kick-off & Introduction slides: see TEAMS Class Material

Suitable for: CoLi, CS and CuK

Organization:

We plan to hold a first planing meeting early in the semester. For the actual seminar (doodle decision on time and papers) we will have a talk for each participant of 30 minutes followed by 10 minutes discussion (discussions participation will be also graded) . After the talk, the presenter has to prepare a short about 10 pages report and hand it in for grading.

Grading: 40% based on the talk, 40% based on the report, 20% based on discussions participation.

Term paper:

LaTeX template for term papers (zip)
11-point checklist for term papers (pdf)

Topics:

Understanding and generation of multimodal human dialogue behavior;
Social signals/affective computing;
Multimodal dialogue modelling;
Large Language Models for Dialogue Modelling and Analysis
Multimodal dialogue systems & applications

*Each talk will be based on a research paper

Multimodality: multimodal behaviour, tracking devices, annotations and tools

A. Withana, D. Groeger, and J. Steimle. Tacttoo: A Thin and Feel-Through Tattoo for On-Skin Tactile Output. In: Proc. of the 31st Annual ACM Symp. on User Interface Software and Technology. UIST ’18.
ACM, 2018, pp. 365–378.

2. Wang, H., Mendiratta, M., Theobalt, C., & Kortylewski, A. (2024). FaceGPT: Self-supervised learning to chat about 3d human faces. arXiv preprint arXiv:2406.07163.

3. Jiang, B., Chen, X., Liu, W., Yu, J., Yu, G., & Chen, T. (2023). MotionGPT: Human motion as a foreign language. Advances in Neural Information Processing Systems, 36, 20067-20079.

4. Oppenlaender, J., Johnston, H., Silvennoinen, J. M., & Barranha, H. (2025). Artworks reimagined: Exploring human-AI co-creation through body prompting. Proceedings of the ACM on Human-Computer Interaction, 9(4), 1-34.

5. M. Rekrut, A. M. Selim, and A. Krüger. Improving Silent Speech BCI Training Procedures Through Transfer from Overt to Silent Speech. In: 2022 IEEE Int. Conf. on Systems, Man, and Cybernetics (SMC).
2022, pp. 2650–2656. d o ı: 10.1109/SMC53654.2022.9945447.

Emotions, social signals and interaction

6. Joby, N. E., & Umemuro, H. (2023, September). Emotional mimicry as a proxy measurement for pro-social indicators of trust, empathy, liking and altruism. In Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents

7. Marquez Herbuela, V. R. D., & Nagai, Y. (2025, October). Realtime Multimodal Emotion Estimation using Behavioral and Neurophysiological Data. In Proceedings of the 27th International Conference on Multimodal Interaction (pp. 785-787).

8. Welivita, A., Yeh, C. H., & Pu, P. (2023, September). Empathetic response generation for distress support. In Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue (pp. 632-644).

9. Li, Z., Kangas, J., Farooq, A., & Raisamo, R. (2025, October). Exploring the effects of force feedback on VR Keyboards with varying visual designs. In Proceedings of the 27th International Conference on Multimodal Interaction (pp. 106-115).

10. Buker, A., Smith, E., Perepelkina, O., & Vinciarelli, A. (2025, October). Multimodal Analysis of Disagreement in Dyadic Conversations: An Approach Based on Emotion Recognition. In Proceedings of the 27th International Conference on Multimodal Interaction (pp. 228-237).

11. Santana, R., Irfan, B., Lagerstedt, E., Skantze, G., & Pereira, A. (2025, October). Speech-to-Joy: Self-Supervised Features for Enjoyment Prediction in Human–Robot Conversation. In Proceedings of the 27th International Conference on Multimodal Interaction (pp. 238-248).

Multimodal fusion, dialogue modelling and management

12. Chen, S. (2025, October). What makes you say yes? An investigation of mental state and personality in persuasion during a dyadic conversation. In Proceedings of the 27th International Conference on Multimodal Interaction (pp. 16-24).

13. Gryshchuk, V., Maistro, M., Lioma, C., & Ruotsalo, T. (2025, October). Decoding Affective States without Labels: Bimodal Image-brain Supervision. In Proceedings of the 27th International Conference on Multimodal Interaction (pp. 25-34).

14. Zhang, H., Marquez Herbuela, V. R. D., & Nagai, Y. (2025, October). Foundation Feature-Guided Hierarchical Fusion of EEG-Physiological for Emotion Estimation. In Proceedings of the 27th International Conference on Multimodal Interaction (pp. 44-50).. arXiv preprint arXiv:2309.10015.

15. Coca, A., Tseng, B. H., Chen, J., Lin, W., Zhang, W., Anders, T., & Byrne, B. (2023). Grounding Description-Driven Dialogue State Trackers with Knowledge-Seeking Turns. arXiv preprint arXiv:2309.13448.

16. Ramirez, A., Agarwal, K., Juraska, J., Garg, U., & Walker, M. A. (2023). Controllable Generation of Dialogue Acts for Dialogue Systems via Few-Shot Response Generation and Ranking. arXiv preprint arXiv:2307.14440.

Large Language Models for Dialogue Modelling and Analysis

17. Finch, S. E., Paek, E. S., & Choi, J. D. (2023). Leveraging large language models for automated dialogue analysis. arXiv preprint arXiv:2309.06490.

18. Addlesee, A., Sieińska, W., Gunson, N., Garcia, D. H., Dondrup, C., & Lemon, O. (2023). Multi-party goal tracking with LLMs: Comparing pre-training, fine-tuning, and prompt engineering. arXiv preprint arXiv:2308.15231.

19. Ostyakova, L., Smilga, V., Petukhova, K., Molchanova, M., & Kornev, D. (2023, September). ChatGPT vs. Crowdsourcing vs. Experts: Annotating Open-Domain Conversations with Speech Functions. In Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue (pp. 242-254).

Multimodal dialogue systems & applications

20. Alsarrani, R., Esposito, A., & Vinciarelli, A. (2025, October). Punctual or Continuous? Analyzing Depression Traces in Language and Paralanguage with Multiple Instance Learning. In Proceedings of the 27th International Conference on Multimodal Interaction (pp. 614-623).

21. Marcoux, A., Tessier, M. H., & Jackson, P. L. (2023, September). Nonverbal Markers of Empathy in Virtual Healthcare Professionals. In Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents (pp. 1-4).

22. Valerio, R., & Mahmoud, M. (2025, October). A multimodal Framework for exploring behavioural cues for automatic Stress Detection. In Proceedings of the 27th International Conference on Multimodal Interaction (pp. 535-539).

23. Garcia, J. C., Suglia, A., Eshghi, A., & Hastie, H. (2023, July). ‘What are you referring to?’Evaluating the ability of multi-modal dialogue models to process clarificational exchanges. In 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp. 1-8). Association for Computational Linguistics.

24. Shoa, A., Oliva, R., Slater, M., & Friedman, D. (2023, September). Sushi with Einstein: Enhancing Hybrid Live Events with LLM-Based Virtual Humans. In Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents (pp. 1-6).

For any questions, please send an email to:

v.petukhova@lsv.uni-saarland.de

Use subject tag: [MDS_2026]