Hostname: page-component-6b989bf9dc-pkhfk Total loading time: 0 Render date: 2024-04-14T18:19:33.561Z Has data issue: false hasContentIssue false

Hierarchical reinforcement learning for situated natural language generation

Published online by Cambridge University Press:  10 January 2014

NINA DETHLEFS
Affiliation:
Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK e-mail: n.s.dethlefs@gmail.com, h.cuayahuitl@gmail.com
HERIBERTO CUAYÁHUITL
Affiliation:
Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK e-mail: n.s.dethlefs@gmail.com, h.cuayahuitl@gmail.com

Abstract

Natural Language Generation systems in interactive settings often face a multitude of choices, given that the communicative effect of each utterance they generate depends crucially on the interplay between its physical circumstances, addressee and interaction history. This is particularly true in interactive and situated settings. In this paper we present a novel approach for situated Natural Language Generation in dialogue that is based on hierarchical reinforcement learning and learns the best utterance for a context by optimisation through trial and error. The model is trained from human–human corpus data and learns particularly to balance the trade-off between efficiency and detail in giving instructions: the user needs to be given sufficient information to execute their task, but without exceeding their cognitive load. We present results from simulation and a task-based human evaluation study comparing two different versions of hierarchical reinforcement learning: One operates using a hierarchy of policies with a large state space and local knowledge, and the other additionally shares knowledge across generation subtasks to enhance performance. Results show that sharing knowledge across subtasks achieves better performance than learning in isolation, leading to smoother and more successful interactions that are better perceived by human users.

Type
Articles
Copyright
Copyright © Cambridge University Press 2014 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Angeli, G., Liang, P., and Klein, D. 2010. A simple domain-independent probabilistic approach to generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), MIT Stata Center, MA.Google Scholar
Bateman, J. A., Hois, J., Ross, R., and Tenbrink, T. 2010. A linguistic ontology of space for natural language processing. Artificial Intelligence 174 (14): 1027–71.CrossRefGoogle Scholar
Belz, A., and Reiter, E. 2006. Comparing automatic and human evaluations of NLG systems. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Trento, Italy.Google Scholar
Benotti, L., and Denis, A. 2011a. CL system: giving instructions by corpus based selection. In Proceedings of the Generation Challenges Sessions at the 13th European Workshop on Natural Language Generation (ENLG), Nancy, France.Google Scholar
Benotti, L., and Denis, A. 2011b. Giving instructions in virtual environments by corpus-based selection. In Proceedings of the 12th Annual Meeting on Discourse and Dialogue (SIGdial), Portland, OR.Google Scholar
Bohus, D., Langner, B., Raux, A., Black, A., Eskenazi, M., and Rudnicky, A. 2006. Online supervised learning of non-understanding recovery policies. In Proceedings of the IEEE Workshop on Spoken Language Technology, Palm Beach, Aruba.Google Scholar
Bontcheva, K., and Wilks, Y. 2001. Dealing with dependencies between content planning and surface realisation in a pipeline generation architecture. In Proceedings of International Joint Conference in Artificial Intelligence (IJCAI '01), pp. 7–10.Google Scholar
Branavan, S. R. K., Chen, H., Zettlemoyer, L., and Barzilay, R. 2009. Reinforcement learning for mapping instructions to actions. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore.Google Scholar
Bulyko, I., and Ostendorf, M. 2002. Efficient integrated response generation from multiple targets using weighted finite state transducers. Computer Speech and Language 16: 533–50.CrossRefGoogle Scholar
Byron, D., Koller, A., Striegnitz, K., Cassell, J., Dale, R., Moore, J., and Oberlander, J. 2009. Report on the first NLG challenge on generating instructions in virtual environments (GIVE). In Proceedings of the 12th European Workshop on Natural Language Generation (ENLG), Athens, Greece.Google Scholar
Chen, D. L., Kim, J., and Mooney, R. J. 2010. Training a multilingual sportscaster: using perceptual context to learn language. Journal of Artificial Intelligence Research 37: 397435.CrossRefGoogle Scholar
Cuayáhuitl, H. 2009. Hierarchical Reinforcement Learning for Spoken Dialogue Systems. PhD thesis, School of Informatics, University of Edinburgh, Scotland, UK.Google Scholar
Cuayáhuitl, H., and Dethlefs, N. 2011a. Optimizing situated dialogue management in unknown environments. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Florence, Italy.Google Scholar
Cuayáhuitl, H., and Dethlefs, N. 2011b. Spatially aware dialogue control using hierarchical reinforcement learning. ACM Transactions on Speech and Language Processing (Special Issue on Machine Learning for Robust and Adaptive Spoken Dialogue System) 7 (3): Article 5. doi:10.1145/1966407.1966410.Google Scholar
Cuayáhuitl, H., Kruijff-Korboyová, I., and Dethlefs, N. 2012. Hierarchical dialogue policy learning using flexible state transitions and linear function approximation. In Proceedings of the 24th International Conference on Computational Linguistics (COLING), Mumbai, India.Google Scholar
Cuayáhuitl, H., Renals, S., Lemon, O., and Shimodaira, H. 2010. Evaluation of a hierarchical reinforcement learning spoken dialogue system. Computer Speech and Language 24 (2): 395429.CrossRefGoogle Scholar
Dale, R., and Viethen, J. 2009. Referring expression generation through attribute-based heuristics. In Proceedings of the 12th European Workshop on Natural Language Generation (ENLG), Athens, Greece.Google Scholar
Denis, A. 2010. Generating referring expressions with reference domain theory. In Proceedings of the 6th International Natural Language Generation Conference (INLG), Dublin, Ireland.Google Scholar
Denis, A. 2011. The loria instruction generation system l in GIVE-2.5. In Proceedings of the Generation Challenges Sessions at the 13th European Workshop on Natural Language Generation (ENLG), Nancy, France.Google Scholar
Dethlefs, N., and Cuayáhuitl, H. 2010. Hierarchical reinforcement learning for adaptive text generation. In Proceedings of the 6th International Natural Language Generation Conference (INLG), Dublin, Ireland.Google Scholar
Dethlefs, N., and Cuayáhuitl, H. 2011. Combining hierarchical reinforcement learning and bayesian networks for natural language generation in situated dialogue. In Proceedings of the 13th European Workshop on Natural Language Generation (ENLG), Nancy, France.Google Scholar
Dethlefs, N., and Cuayáhuitl, H. 2012. Comparing HMMs and Bayesian networks for surface realisation. In Proceedings of the 12th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Montréal, Canada.Google Scholar
Dethlefs, N., Cuayáhuitl, H., Richter, K.-F., Andonova, E., and Bateman, J. 2010. Evaluating task success in a dialogue system for indoor navigation. In Proceedings of the 14th Workshop on the Semantics and Pragmatics of Dialogue (SemDial), Poznan, Poland.Google Scholar
Dietterich, T. G. 2000. An overview of MAXQ hierarchical reinforcement learning. In Symposium on Abstraction, Reformulation, and Approximation (SARA), HorseShoeBay, TX.Google Scholar
Dzikovska, M., Moore, J., Steinauser, N., and Campbell, G. 2001. Exploring user satisfaction in a tutorial dialogue system. In Proceedings of the 12th Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), Aalborg, Denmark.Google Scholar
Foster, M. E., and Oberlander, J. 2006. Data-driven generation of emphatic facial displays. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Trento, Italy, pp. 353–60.Google Scholar
Gargett, A., Garoufi, K., Koller, A., and Striegnitz, K. 2010. The GIVE-2 corpus of generating instructions in virtual environments. In Proceedings of the 7th International Conference on Language Resources and Evaluation, Malta.Google Scholar
Garoufi, K., and Koller, A. 2010. Automated planning for situated natural language generation. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden.Google Scholar
Garoufi, K., and Koller, A. 2011a. Combining symbolic and corpus-based approaches for the generation of successful referring expressions. In Proceedings of the 13th European Workshop on Natural Language Generation (ENLG), Nancy, France.Google Scholar
Garoufi, K., and Koller, A. 2011b. The potsdam NLG systems at the GIVE-2.5 challenge. In Proceedings of the Generation Challenges Session at the 13th European Workshop on Natural Language Generation (ENLG), Nancy, France.Google Scholar
Gašić, M., Jurčíček, F., Thomson, B., Yu, K., and Young, S. 2011. On-line policy optimisation of spoken dialogue systems via interaction with human subjects. In Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU), Waikoloa, HI.Google Scholar
Henderson, J., Lemon, O., and Georgila, K. 2005. Hybrid reinforcement learning for dialogue policies from communicator data. In Proceedings of the IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems (KRPDS), Edinburgh, Scotland, UK.Google Scholar
Henderson, J., Lemon, O., and Georgila, K. 2008. Hybrid reinforcement/supervised learning of dialogue policies from fixed data sets. Computational Linguistics 34 (4): 487511.CrossRefGoogle Scholar
Hone, K., and Graham, R. 2000. Towards a tool for the Subjective Assessment of Speech System Interfaces (SASSI). Natural Language Engineering 6 (3–4): 287303.CrossRefGoogle Scholar
Janarthanam, S., and Lemon, O. 2010. Learning to adapt to unknown users: referring expression generation in spoken dialogue systems. In Proceedings of the 48th Annual Meeting of the Association of Computational Linguistics (ACL), Uppsala, Sweden.Google Scholar
Jurcícek, F., Thompson, B., and Young, S. 2011. Natural actor and belief critic: reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs. ACM Transactions on Speech and Language Processing 7 (3): 6.CrossRefGoogle Scholar
Koller, A., and Petrick, R. 2011. Experiences with planning for natural language generation. Computational Intelligence 27 (1): 2340.CrossRefGoogle Scholar
Koller, A., Striegnitz, K., Byron, D., Cassell, J., Dale, R., and Moore, J. 2010. The first challenge on generating instructions in virtual environments. In Theune, M. and Krahmer, E. (eds.), Empirical Methods in Natural Language Generation, pp. 337–61. Berlin, Germany: Springer-Verlag.Google Scholar
Lemon, O. 2011. Learning what to say and how to say it: joint optimization of spoken dialogue management and natural language generation. Computer Speech and Language 25 (2): 210221.CrossRefGoogle Scholar
Levelt, W. 1989. Speaking: From Intenion to Articulation. Cambridge, MA: MIT Press.Google Scholar
Marciniak, T., and Strube, M. 2004. Classification-based generation using TAG. In Proceedings of the 3rd International Conference on Natural Language Generation (INLG), New Forest, UK.Google Scholar
Marciniak, T., and Strube, M. 2005. Beyond the pipeline: discrete optimization in NLP. In Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL), Ann Arbor, MI.Google Scholar
Möller, S., Smeele, P., Boland, H., and Krebber, J. 2007. Evaluating spoken dialogue systems according to standards: a case study. Computer, Speech and Language 21 (1): 2653.CrossRefGoogle Scholar
Nakatsu, C., and White, M. 2006. Learning to say it well: reranking realizations by predicted synthesis quality. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (COLING-ACL 2006), Sydney, Australia, pp. 1113–20.Google Scholar
Pietquin, O., Geist, M., Chandramohan, S., and Frezza-Buet, H. 2011. Sample-efficient batch reinforcement learning for dialogue management optimization. ACM Transactions on Speech and Language Processing (Special Issue on Machine Learning for Robust and Adaptive Spoken Dialogue Systems 7 (3): Article No. 7. doi:10.1145/1966407.1966412.Google Scholar
Racca, D. N., Benotti, L., and Duboue, P. 2011. The GIVE-2.5 C generation system. In Proceedings of the Generation Challenges Sessions at the 13th European Workshop on Natural Language Generation (ENLG), Nancy, France.Google Scholar
Rieser, V., and Lemon, O. 2008. Learning effective mutlimodal dialogue strategies from wizard-of-oz data: bootstrapping and evaluation. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL), Columbus, OH.Google Scholar
Rieser, V., Lemon, O., and Liu, X. 2010. Optimising information presentation for spoken dialogue systems. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), Uppsala, Sweden.Google Scholar
Singh, S., Litman, D., Kearns, M., and Walker, M. 2002. Optimizing dialogue management with reinforcement learning: experiments with the NJFun system. Journal of Artificial Intelligence Research 16: 105–33.CrossRefGoogle Scholar
Stoia, L., Shockley, D., Byron, D., and Fosler-Lussier, E. 2006. Noun phrase generation for situated dialogs. In Proceedings of the 4th International Conference on Natural Language Generation (INLG), Sydney, Australia.Google Scholar
Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., and Bregler, C. 2004. Textual economy through close coupling of syntax and semantics. ACM Transactions on Graphics (Special Issue on ACM SIGGRAPH, Los Angeles, CA) 23 (3): 506–13.CrossRefGoogle Scholar
Stone, M., and Webber, B. 1998. Textual economy through close coupling of syntax and semantics. In Proceedings of the International Workshop on Natural Language Generation, Niagara-on-the-Lake, Canada.Google Scholar
Striegnitz, K., Denis, A., Gargett, A., Garoufi, K., Koller, A., and Theune, M. 2011. Report on the second challenge on generating instructions in virtual environments (GIVE-2.5). In Proceedings of the Generation Challenges Sessions at the 13th European Workshop on Natural Language Generation (ENLG), Nancy, France.Google Scholar
Sutton, R. S. 1996. Generalization in reinforcement learning: successful examples using sparse coarse coding. In Touretzky, D. S., Mozer, M. C. and Hasselmo, M. E. (eds.), Advances in Neural Information Processing Systems, pp. 1038–44. Cambridge, MA: MIT Press.Google Scholar
Sutton, R. S., and Barto, A. G. 1998. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.Google Scholar
Szepesvári, C. 2010. Algorithms for Reinforcement Learning. San Rafael, CA: Morgan and Claypool.CrossRefGoogle Scholar
Tullis, T., and Albert, B. 2008. Measuring the User Experience: Collecting, Analyzing and Presenting Usability Metrics. San Francisco, CA: Morgan Kaufman.Google Scholar
Viethen, J., Dale, R., and Guhe, M. 2011. The impact of visual context on the content of referring expressions. In Proceedings of the 13th European Workshop on Natural Language Generation (ENLG), Nancy, France.Google Scholar
Walker, M., Kamm, C., and Litman, D. 2000. Towards developing general models of usability with PARADISE. Natural Language Engineering 6 (3): 363–77.CrossRefGoogle Scholar
Walker, M., Litman, D., Kamm, C., and Abella, A. 1997. PARADISE: a framework for evaluating spoken dialogue agents. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL), Madrid, Spain.Google Scholar
Walker, M., Stent, A., Mairesse, F., and Prasad, R. 2007. Individual and domain adaptation in sentence planning for dialogue. Journal of Artificial Intelligence Research (JAIR) 30: 413–56.CrossRefGoogle Scholar
Watkins, C. 1989. Learning from Delayed Rewards. PhD thesis, King's College, Cambridge, UK.Google Scholar
Williams, J., and Young, S. 2007. Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language 21 (2): 393422.CrossRefGoogle Scholar
Witten, I. H., and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques. San Francisco, CA: Morgan Kaufman.Google Scholar
Wolters, M., Georgila, K., Moore, J., Logie, R., and MacPherson, S. 2009. Reducing working load memory in spoken dialogue systems. Interacting with Computers 21 (4): 276–87.CrossRefGoogle Scholar