Hierarchical reinforcement learning for situated natural language generation

NINA DETHLEFS; HERIBERTO CUAYÁHUITL

doi:10.1017/S1351324913000375

Hierarchical reinforcement learning for situated natural language generation

Published online by Cambridge University Press: 10 January 2014

NINA DETHLEFS and

HERIBERTO CUAYÁHUITL

Show author details

NINA DETHLEFS: Affiliation:
Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK e-mail: n.s.dethlefs@gmail.com, h.cuayahuitl@gmail.com
HERIBERTO CUAYÁHUITL: Affiliation:
Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK e-mail: n.s.dethlefs@gmail.com, h.cuayahuitl@gmail.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Natural Language Generation systems in interactive settings often face a multitude of choices, given that the communicative effect of each utterance they generate depends crucially on the interplay between its physical circumstances, addressee and interaction history. This is particularly true in interactive and situated settings. In this paper we present a novel approach for situated Natural Language Generation in dialogue that is based on hierarchical reinforcement learning and learns the best utterance for a context by optimisation through trial and error. The model is trained from human–human corpus data and learns particularly to balance the trade-off between efficiency and detail in giving instructions: the user needs to be given sufficient information to execute their task, but without exceeding their cognitive load. We present results from simulation and a task-based human evaluation study comparing two different versions of hierarchical reinforcement learning: One operates using a hierarchy of policies with a large state space and local knowledge, and the other additionally shares knowledge across generation subtasks to enhance performance. Results show that sharing knowledge across subtasks achieves better performance than learning in isolation, leading to smoother and more successful interactions that are better perceived by human users.

Type: Articles
Information: Natural Language Engineering , Volume 21 , Issue 3 , May 2015 , pp. 391 - 435

DOI: https://doi.org/10.1017/S1351324913000375 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Angeli, G., Liang, P., and Klein, D. 2010. A simple domain-independent probabilistic approach to generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), MIT Stata Center, MA.Google Scholar

Bateman, J. A., Hois, J., Ross, R., and Tenbrink, T. 2010. A linguistic ontology of space for natural language processing. Artificial Intelligence 174 (14): 1027–71.CrossRef Google Scholar

Belz, A., and Reiter, E. 2006. Comparing automatic and human evaluations of NLG systems. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Trento, Italy.Google Scholar

Benotti, L., and Denis, A. 2011a. CL system: giving instructions by corpus based selection. In Proceedings of the Generation Challenges Sessions at the 13th European Workshop on Natural Language Generation (ENLG), Nancy, France.Google Scholar

Benotti, L., and Denis, A. 2011b. Giving instructions in virtual environments by corpus-based selection. In Proceedings of the 12th Annual Meeting on Discourse and Dialogue (SIGdial), Portland, OR.Google Scholar

Bohus, D., Langner, B., Raux, A., Black, A., Eskenazi, M., and Rudnicky, A. 2006. Online supervised learning of non-understanding recovery policies. In Proceedings of the IEEE Workshop on Spoken Language Technology, Palm Beach, Aruba.Google Scholar

Bontcheva, K., and Wilks, Y. 2001. Dealing with dependencies between content planning and surface realisation in a pipeline generation architecture. In Proceedings of International Joint Conference in Artificial Intelligence (IJCAI '01), pp. 7–10.Google Scholar

Branavan, S. R. K., Chen, H., Zettlemoyer, L., and Barzilay, R. 2009. Reinforcement learning for mapping instructions to actions. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore.Google Scholar

Bulyko, I., and Ostendorf, M. 2002. Efficient integrated response generation from multiple targets using weighted finite state transducers. Computer Speech and Language 16: 533–50.CrossRef Google Scholar

Byron, D., Koller, A., Striegnitz, K., Cassell, J., Dale, R., Moore, J., and Oberlander, J. 2009. Report on the first NLG challenge on generating instructions in virtual environments (GIVE). In Proceedings of the 12th European Workshop on Natural Language Generation (ENLG), Athens, Greece.Google Scholar

Chen, D. L., Kim, J., and Mooney, R. J. 2010. Training a multilingual sportscaster: using perceptual context to learn language. Journal of Artificial Intelligence Research 37: 397–435.CrossRef Google Scholar

Cuayáhuitl, H. 2009. Hierarchical Reinforcement Learning for Spoken Dialogue Systems. PhD thesis, School of Informatics, University of Edinburgh, Scotland, UK.Google Scholar

Cuayáhuitl, H., and Dethlefs, N. 2011a. Optimizing situated dialogue management in unknown environments. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Florence, Italy.Google Scholar

Cuayáhuitl, H., and Dethlefs, N. 2011b. Spatially aware dialogue control using hierarchical reinforcement learning. ACM Transactions on Speech and Language Processing (Special Issue on Machine Learning for Robust and Adaptive Spoken Dialogue System) 7 (3): Article 5. doi:10.1145/1966407.1966410.Google Scholar

Cuayáhuitl, H., Kruijff-Korboyová, I., and Dethlefs, N. 2012. Hierarchical dialogue policy learning using flexible state transitions and linear function approximation. In Proceedings of the 24th International Conference on Computational Linguistics (COLING), Mumbai, India.Google Scholar

Cuayáhuitl, H., Renals, S., Lemon, O., and Shimodaira, H. 2010. Evaluation of a hierarchical reinforcement learning spoken dialogue system. Computer Speech and Language 24 (2): 395–429.CrossRef Google Scholar

Dale, R., and Viethen, J. 2009. Referring expression generation through attribute-based heuristics. In Proceedings of the 12th European Workshop on Natural Language Generation (ENLG), Athens, Greece.Google Scholar

Denis, A. 2010. Generating referring expressions with reference domain theory. In Proceedings of the 6th International Natural Language Generation Conference (INLG), Dublin, Ireland.Google Scholar

Denis, A. 2011. The loria instruction generation system l in GIVE-2.5. In Proceedings of the Generation Challenges Sessions at the 13th European Workshop on Natural Language Generation (ENLG), Nancy, France.Google Scholar

Dethlefs, N., and Cuayáhuitl, H. 2010. Hierarchical reinforcement learning for adaptive text generation. In Proceedings of the 6th International Natural Language Generation Conference (INLG), Dublin, Ireland.Google Scholar

Dethlefs, N., and Cuayáhuitl, H. 2011. Combining hierarchical reinforcement learning and bayesian networks for natural language generation in situated dialogue. In Proceedings of the 13th European Workshop on Natural Language Generation (ENLG), Nancy, France.Google Scholar

Dethlefs, N., and Cuayáhuitl, H. 2012. Comparing HMMs and Bayesian networks for surface realisation. In Proceedings of the 12th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Montréal, Canada.Google Scholar

Dethlefs, N., Cuayáhuitl, H., Richter, K.-F., Andonova, E., and Bateman, J. 2010. Evaluating task success in a dialogue system for indoor navigation. In Proceedings of the 14th Workshop on the Semantics and Pragmatics of Dialogue (SemDial), Poznan, Poland.Google Scholar

Dietterich, T. G. 2000. An overview of MAXQ hierarchical reinforcement learning. In Symposium on Abstraction, Reformulation, and Approximation (SARA), HorseShoeBay, TX.Google Scholar

Dzikovska, M., Moore, J., Steinauser, N., and Campbell, G. 2001. Exploring user satisfaction in a tutorial dialogue system. In Proceedings of the 12th Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), Aalborg, Denmark.Google Scholar

Foster, M. E., and Oberlander, J. 2006. Data-driven generation of emphatic facial displays. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Trento, Italy, pp. 353–60.Google Scholar

Gargett, A., Garoufi, K., Koller, A., and Striegnitz, K. 2010. The GIVE-2 corpus of generating instructions in virtual environments. In Proceedings of the 7th International Conference on Language Resources and Evaluation, Malta.Google Scholar

Garoufi, K., and Koller, A. 2010. Automated planning for situated natural language generation. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden.Google Scholar

Garoufi, K., and Koller, A. 2011a. Combining symbolic and corpus-based approaches for the generation of successful referring expressions. In Proceedings of the 13th European Workshop on Natural Language Generation (ENLG), Nancy, France.Google Scholar

Garoufi, K., and Koller, A. 2011b. The potsdam NLG systems at the GIVE-2.5 challenge. In Proceedings of the Generation Challenges Session at the 13th European Workshop on Natural Language Generation (ENLG), Nancy, France.Google Scholar

Gašić, M., Jurčíček, F., Thomson, B., Yu, K., and Young, S. 2011. On-line policy optimisation of spoken dialogue systems via interaction with human subjects. In Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU), Waikoloa, HI.Google Scholar

Henderson, J., Lemon, O., and Georgila, K. 2005. Hybrid reinforcement learning for dialogue policies from communicator data. In Proceedings of the IJCAI Workshop on Knowledge and Reasoning in Practical Dialogue Systems (KRPDS), Edinburgh, Scotland, UK.Google Scholar

Henderson, J., Lemon, O., and Georgila, K. 2008. Hybrid reinforcement/supervised learning of dialogue policies from fixed data sets. Computational Linguistics 34 (4): 487–511.CrossRef Google Scholar

Hone, K., and Graham, R. 2000. Towards a tool for the Subjective Assessment of Speech System Interfaces (SASSI). Natural Language Engineering 6 (3–4): 287–303.CrossRef Google Scholar

Janarthanam, S., and Lemon, O. 2010. Learning to adapt to unknown users: referring expression generation in spoken dialogue systems. In Proceedings of the 48th Annual Meeting of the Association of Computational Linguistics (ACL), Uppsala, Sweden.Google Scholar

Jurcícek, F., Thompson, B., and Young, S. 2011. Natural actor and belief critic: reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs. ACM Transactions on Speech and Language Processing 7 (3): 6.CrossRef Google Scholar

Koller, A., and Petrick, R. 2011. Experiences with planning for natural language generation. Computational Intelligence 27 (1): 23–40.CrossRef Google Scholar

Koller, A., Striegnitz, K., Byron, D., Cassell, J., Dale, R., and Moore, J. 2010. The first challenge on generating instructions in virtual environments. In Theune, M. and Krahmer, E. (eds.), Empirical Methods in Natural Language Generation, pp. 337–61. Berlin, Germany: Springer-Verlag.Google Scholar

Lemon, O. 2011. Learning what to say and how to say it: joint optimization of spoken dialogue management and natural language generation. Computer Speech and Language 25 (2): 210–221.CrossRef Google Scholar

Levelt, W. 1989. Speaking: From Intenion to Articulation. Cambridge, MA: MIT Press.Google Scholar

Marciniak, T., and Strube, M. 2004. Classification-based generation using TAG. In Proceedings of the 3rd International Conference on Natural Language Generation (INLG), New Forest, UK.Google Scholar

Marciniak, T., and Strube, M. 2005. Beyond the pipeline: discrete optimization in NLP. In Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL), Ann Arbor, MI.Google Scholar

Möller, S., Smeele, P., Boland, H., and Krebber, J. 2007. Evaluating spoken dialogue systems according to standards: a case study. Computer, Speech and Language 21 (1): 26–53.CrossRef Google Scholar

Nakatsu, C., and White, M. 2006. Learning to say it well: reranking realizations by predicted synthesis quality. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (COLING-ACL 2006), Sydney, Australia, pp. 1113–20.Google Scholar

Pietquin, O., Geist, M., Chandramohan, S., and Frezza-Buet, H. 2011. Sample-efficient batch reinforcement learning for dialogue management optimization. ACM Transactions on Speech and Language Processing (Special Issue on Machine Learning for Robust and Adaptive Spoken Dialogue Systems 7 (3): Article No. 7. doi:10.1145/1966407.1966412.Google Scholar

Racca, D. N., Benotti, L., and Duboue, P. 2011. The GIVE-2.5 C generation system. In Proceedings of the Generation Challenges Sessions at the 13th European Workshop on Natural Language Generation (ENLG), Nancy, France.Google Scholar

Rieser, V., and Lemon, O. 2008. Learning effective mutlimodal dialogue strategies from wizard-of-oz data: bootstrapping and evaluation. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL), Columbus, OH.Google Scholar

Rieser, V., Lemon, O., and Liu, X. 2010. Optimising information presentation for spoken dialogue systems. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), Uppsala, Sweden.Google Scholar

Singh, S., Litman, D., Kearns, M., and Walker, M. 2002. Optimizing dialogue management with reinforcement learning: experiments with the NJFun system. Journal of Artificial Intelligence Research 16: 105–33.CrossRef Google Scholar

Stoia, L., Shockley, D., Byron, D., and Fosler-Lussier, E. 2006. Noun phrase generation for situated dialogs. In Proceedings of the 4th International Conference on Natural Language Generation (INLG), Sydney, Australia.Google Scholar

Stone, M., DeCarlo, D., Oh, I., Rodriguez, C., Stere, A., Lees, A., and Bregler, C. 2004. Textual economy through close coupling of syntax and semantics. ACM Transactions on Graphics (Special Issue on ACM SIGGRAPH, Los Angeles, CA) 23 (3): 506–13.CrossRef Google Scholar

Stone, M., and Webber, B. 1998. Textual economy through close coupling of syntax and semantics. In Proceedings of the International Workshop on Natural Language Generation, Niagara-on-the-Lake, Canada.Google Scholar

Striegnitz, K., Denis, A., Gargett, A., Garoufi, K., Koller, A., and Theune, M. 2011. Report on the second challenge on generating instructions in virtual environments (GIVE-2.5). In Proceedings of the Generation Challenges Sessions at the 13th European Workshop on Natural Language Generation (ENLG), Nancy, France.Google Scholar

Sutton, R. S. 1996. Generalization in reinforcement learning: successful examples using sparse coarse coding. In Touretzky, D. S., Mozer, M. C. and Hasselmo, M. E. (eds.), Advances in Neural Information Processing Systems, pp. 1038–44. Cambridge, MA: MIT Press.Google Scholar

Sutton, R. S., and Barto, A. G. 1998. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.Google Scholar

Szepesvári, C. 2010. Algorithms for Reinforcement Learning. San Rafael, CA: Morgan and Claypool.CrossRef Google Scholar

Tullis, T., and Albert, B. 2008. Measuring the User Experience: Collecting, Analyzing and Presenting Usability Metrics. San Francisco, CA: Morgan Kaufman.Google Scholar

Viethen, J., Dale, R., and Guhe, M. 2011. The impact of visual context on the content of referring expressions. In Proceedings of the 13th European Workshop on Natural Language Generation (ENLG), Nancy, France.Google Scholar

Walker, M., Kamm, C., and Litman, D. 2000. Towards developing general models of usability with PARADISE. Natural Language Engineering 6 (3): 363–77.CrossRef Google Scholar

Walker, M., Litman, D., Kamm, C., and Abella, A. 1997. PARADISE: a framework for evaluating spoken dialogue agents. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL), Madrid, Spain.Google Scholar

Walker, M., Stent, A., Mairesse, F., and Prasad, R. 2007. Individual and domain adaptation in sentence planning for dialogue. Journal of Artificial Intelligence Research (JAIR) 30: 413–56.CrossRef Google Scholar

Watkins, C. 1989. Learning from Delayed Rewards. PhD thesis, King's College, Cambridge, UK.Google Scholar

Williams, J., and Young, S. 2007. Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language 21 (2): 393–422.CrossRef Google Scholar

Witten, I. H., and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques. San Francisco, CA: Morgan Kaufman.Google Scholar

Wolters, M., Georgila, K., Moore, J., Logie, R., and MacPherson, S. 2009. Reducing working load memory in spoken dialogue systems. Interacting with Computers 21 (4): 276–87.CrossRef Google Scholar

Article contents

Hierarchical reinforcement learning for situated natural language generation

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests