Learning human multimodal dialogue strategies

V. RIESER; O. LEMON

doi:10.1017/S1351324909005099

Learning human multimodal dialogue strategies

Published online by Cambridge University Press: 22 April 2009

V. RIESER and

O. LEMON

Show author details

V. RIESER: Affiliation:
School of Informatics, University of Edinburgh, Edinburgh, EH9 8AB, GB e-mail: vrieser@inf.ed.ac.uk
O. LEMON: Affiliation:
School of Informatics, University of Edinburgh, Edinburgh, EH9 8AB, GB e-mail: olemon@inf.ed.ac.uk

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

We investigate the use of different machine learning methods in combination with feature selection techniques to explore human multimodal dialogue strategies and the use of those strategies for automated dialogue systems. We learn policies from data collected in a Wizard-of-Oz study where different human ‘wizards’ decide whether to ask a clarification request in a multimodal manner or else to use speech alone. We first describe the data collection, the coding scheme and annotated corpus, and the validation of the multimodal annotations. We then show that there is a uniform multimodal dialogue strategy across wizards, which is based on multiple features in the dialogue context. These are generic features, available at runtime, which can be implemented in dialogue systems. Our prediction models (for human wizard behaviour) achieve a weighted f-score of 88.6 per cent (which is a 25.6 per cent improvement over the majority baseline). We interpret and discuss the learned strategy. We conclude that human wizard behaviour is not optimal for automatic dialogue systems, and argue for the use of automatic optimization methods, such as Reinforcement Learning. Throughout the investigation we also discuss the issues arising from using small initial Wizard-of-Oz data sets, and we show that feature engineering is an essential step when learning dialogue strategies from such limited data.

Type: Papers
Information: Natural Language Engineering , Volume 16 , Issue 1 , January 2010 , pp. 3 - 23

DOI: https://doi.org/10.1017/S1351324909005099 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Carletta, J. 1996. Assessing agreement on classification tasks: the kappa statistic. Computational Linguistic 2 (22): 249–254.Google Scholar

Carletta, J., Evert, S., Heid, U., Kilgour, J., Robertson, J., and Voormann, H. 2003. The NITE XML Toolkit: flexible annotation for multi-modal language data. Behavior Research Methods, Instruments, and Computers, special issue on Measuring Behavior 35 (3): 353–363.CrossRef Google Scholar

Carletta, J., Isard, A., Isard, S., Kowtko, J. C., Doherty-Sneddon, G., and Anderson, A. H. 1997. The reliability of a dialogue structure coding scheme. Computational Linguistics 1 (23): 13–31.Google Scholar

Clark, H. 1996. Using Language. Cambridge University Press, Cambridge.CrossRef Google Scholar

Cohen, W. W. 1995. Fast effective rule induction. In Proceedings of the 12th ICML-95.CrossRef Google Scholar

Craggs, R., and McGee-Wood, M. 2005. Evaluating discourse and dialogue coding schemes. Computational Linguistics 31 (3): 289–296.CrossRef Google Scholar

Daelemans, W., Hoste, V., DeMeulder, F. Meulder, F., and Naudts, B. 2003. Combined optimization of feature selection and algorithm parameter interaction in machine learning of language. In Proceedings of the 14th ECML-03.CrossRef Google Scholar

Fayyad, U., and Irani, K. 1993. Multi-interval discretization of continuous valued attributes for classification learning. In Proc. IJCAI-93.Google Scholar

Hall, M. 2000. Correlation-based feature selection for discrete and numeric class machine learning. In Proc. 17th Int Conf. on Machine Learning.Google Scholar

Henderson, J., Lemon, O., and Georgila, K. 2008. Hybrid reinforcement/supervised learning of dialogue policies from fixed datasets. Computational Linguistics 34 (4): 487–513.CrossRef Google Scholar

John, G., and Langley, P. 1995. Estimating continuous distributions in bayesian classifiers. In Proceedings of the 11th UAI-95. Morgan Kaufmann.Google Scholar

Kruijff-Korbayová, I., Becker, T., Blaylock, N., Gerstenberger, C., Kaisser, M., Poller, P., Rieser, V., and Schehl, J. 2006a. The SAMMIE corpus of multimodal dialogues with an MP3 player. In Proceedings the 5th International Conference on Language Resources and Evaluation (LREC).Google Scholar

Kruijff-Korbayová, I., Blaylock, N., Gerstenberger, C., Rieser, V., Becker, T., Kaisser, M., Poller, P., and Schehl, J. 2005. An experiment setup for collecting data for adaptive output planning in a multimodal dialogue system. In 10th European Workshop on NLG.Google Scholar

Kruijff-Korbayová, I., Rieser, V., Gerstenberger, C., Schehl, J., and Becker, T. 2006b. The Sammie multimodal dialogue corpus meets the Nite XML Toolkit. In Proceedings of the Fifth Workshop on multi-dimensional Markup in Natural Language Processing.CrossRef Google Scholar

Langley, P., and Sage, S. 1994. Induction of selective Bayesian classifiers. In Proceedings of the 10th UAI-94.CrossRef Google Scholar

Le, Z. 2003. Maximum Entropy Modeling Toolkit for Python and C++. homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html.Google Scholar

Lemon, O., Georgila, K., and Henderson, J. 2006. Evaluating effectiveness and portability of reinforcement learned dialogue strategies with real users: the TALK TownInfo evaluation. In IEEE/ACL Spoken Language Technology.CrossRef Google Scholar

Lemon, O., Georgila, K., Henderson, J., Gabsdil, M., Meza-Ruiz, I., and Young, S. 2005. Deliverable D4.1: integration of learning and adaptivity with the ISU approach. Technical report, TALK Project, www.talk-project.org.Google Scholar

Mattes, S. 2003. The lane-change-task as a tool for driver distraction evaluation. In Proc. of IGfA.Google Scholar

Oviatt, S. 2002. Breaking the robustness barrier: recent progress on the design of robust multimodal systems. In Advances in Computers, vol. 56, Academic Press, London.Google Scholar

Oviatt, S., Coulston, R., and Lunsford, R. 2004. When do we interact Multimodally? Cognitive load and multimodal communication patterns. In Proceedings of the 6th ICMI-04.CrossRef Google Scholar

Purver, M., Ginzburg, J., and Healey, P. 2003. On the means for clarification in dialogue. In Smith, R., and van Kuppevelt, J. (eds.), Current and New Directions in Discourse and Dialogue, Dordrecht, The Netherlands.Google Scholar

Quinlan, R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo.Google Scholar

Rieser, V. 2008. Bootstrapping Reinforcement Learning-based Dialogue Strategies from Wizard-of-Oz data. Ph.D. thesis, Saarbruecken Dissertations in Computational Linguistics and Language Technology, Vol 28.Google Scholar

Rieser, V., Kruijff-Korbayová, I., and Lemon, O. 2005. A corpus collection and annotation framework for learning multimodal clarification strategies. In Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue.Google Scholar

Rieser, V., and Lemon, O. 2006. Utilising machine learning to explore human multimodal clarification strategies. In Proceedings of the 44rd Annual Meeting of the Association for Computational Linguistics, COLING/ACL.CrossRef Google Scholar

Rieser, V., and Lemon, O. 2008. Learning effective multimodal dialogue strategies from Wizard-of-Oz data: bootstrapping and evaluation. In Proceedings of ACL.Google Scholar

Rieser, V., and Lemon, O. 2009. Natural language generation as planning under uncertainty for spoken dialogue system. In Proceedings of EACL.CrossRef Google Scholar

Rieser, V., and Moore, J. 2005. Implications for generating clarification requests in task-oriented dialogues. In Proceedings of the 43rd ACL.CrossRef Google Scholar

Rodriguez, K., and Schlangen, D. 2004. Form, intonation and function of clarification requests in German task-orientaded spoken dialogues. In Proceedings of the Eighth Workshop on Formal Semantics and Dialogue.Google Scholar

Salmen, A. 2002. Multimodale Menüausgabe im Fahrzeug (Multimodal Menu-based Interaction in the Vehicle). Ph.D. thesis, University of Regensburg.Google Scholar

Schlangen, D., and Fernandez, R. 2007. Speaking through a noisy channel: experiments on inducing clarification behaviour in human–human dialogue. In Interspeech.CrossRef Google Scholar

Skantze, G. 2005. Exploring human error recovery strategies: implications for spoken dialogue systems. Speech Communication 43 (3): 325–341.CrossRef Google Scholar

Stuttle, M. N., Williams, J. D., and Young, S. 2004. A framework for dialogue data collection with a simulated ASR Channel. In ICSLP.CrossRef Google Scholar

Walker, M., Whittaker, S., Stent, A., Maloor, P., Moore, J., Johnston, M., and Vasireddy, G. 2004. User tailored generation in the MATCH multimodal dialogue system. Cognitive Science, 28: 811–840.CrossRef Google Scholar

Winterboer, A., Hu, J., Moore, J. D., and Nass, C. 2007. The influence of user tailoring and cognitive load on user performance in spoken dialogue systems. in Proc. ICSLP.CrossRef Google Scholar

Witten, I. H., and Frank, E 2005. Data Mining: Practical Machine Learning Tools and Techniques (2nd Edition). Morgan Kaufmann, San Francisco.Google Scholar

Article contents

Learning human multimodal dialogue strategies

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests