Natural Language Engineering


Generating basic skills reports for low-skilled readers*


a1 Department of Computing Science, The Open University, Milton Keynes MK7 6AA, UK e-mail:

a2 Department of Computing Science, University of Aberdeen, Aberdeen AB24 3UE, UK


We describe SkillSum, a Natural Language Generation (NLG) system that generates a personalised feedback report for someone who has just completed a screening assessment of their basic literacy and numeracy skills. Because many SkillSum users have limited literacy, the generated reports must be easily comprehended by people with limited reading skills; this is the most novel aspect of SkillSum, and the focus of this paper. We used two approaches to maximise readability. First, for determining content and structure (document planning), we did not explicitly model readability, but rather followed a pragmatic approach of repeatedly revising content and structure following pilot experiments and interviews with domain experts. Second, for choosing linguistic expressions (microplanning), we attempted to formulate explicitly the choices that enhanced readability, using a constraints approach and preference rules; our constraints were based on corpus analysis and our preference rules were based on psycholinguistic findings. Evaluation of the SkillSum system was twofold: it compared the usefulness of NLG technology to that of canned text output, and it assessed the effectiveness of the readability model. Results showed that NLG was more effective than canned text at enhancing users' knowledge of their skills, and also suggested that the empirical ‘revise based on experiments and interviews’ approach made a substantial contribution to readability as well as our explicit psycholinguistically inspired models of readability choices.

(Received March 20 2006)

(Revised December 08 2006)

(Online publication April 24 2008)


* Many thanks to our industrial collaborators at Cambridge Training and Development, to the literacy and numeracy tutors who helped us, to the people who agreed to be subjects in our experiments and to the colleges for allowing us to run experiments with their students. We also thank our colleagues in Aberdeen and Milton Keynes and the anonymous reviewers for their insightful comments and suggestions. This work was funded by PACCIT-LINK grant ESRC RES-328-25-0026.