Natural Language Engineering


Estimating the number of segments for improving dialogue act labelling


a1 Instituto Tecnológico de Informática, Universidad Politécnica de Valencia, Valencia, Spain e-mail:,,


In dialogue systems it is important to label the dialogue turns with dialogue-related meaning. Each turn is usually divided into segments and these segments are labelled with dialogue acts (DAs). A DA is a representation of the functional role of the segment. Each segment is labelled with one DA, representing its role in the ongoing discourse. The sequence of DAs given a dialogue turn is used by the dialogue manager to understand the turn. Probabilistic models that perform DA labelling can be used on segmented or unsegmented turns. The last option is more likely for a practical dialogue system, but it provides poorer results. In that case, a hypothesis for the number of segments can be provided to improve the results. We propose some methods to estimate the probability of the number of segments based on the transcription of the turn. The new labelling model includes the estimation of the probability of the number of segments in the turn. We tested this new approach with two different dialogue corpora: SwitchBoard and Dihana. The results show that this inclusion significantly improves the labelling accuracy.

(Received March 10 2010)

(Revised July 07 2010)

(Accepted October 20 2010)

(Online publication February 14 2011)

Work supported by the EC (FEDER/FSE), the Spanish Government (MEC, MICINN, MITyC, MAEC, “Plan E”, under grants MIPRCV “Consolider Ingenio 2010” CSD2007-00018, MITTRAL TIN2009-14633-C03-01, TSI-020110-2009-439, FPI fellowship BES-2007-16834), and Generalitat Valenciana (grant Prometeo/2009/014 and grant ACOMP/2010/051).