a1 University of Maryland Center for Advanced Study of Language (CASL) and Department of Linguistics, the Ohio State University
a2 Department of Computer Science and Engineering and Department of Linguistics, the Ohio State University
Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We use this new representation to re-evaluate a key computational model of word segmentation. One finding is that high levels of phonetic variability degrade the model's performance. While robustness to phonetic variability may be intrinsically valuable, this finding needs to be complemented by parallel studies of the actual abilities of children to segment phonetically variable speech.
(Received December 20 2008)
(Revised October 29 2009)
(Accepted January 30 2010)
(Online publication March 22 2010)
[*] Portions of this research were conducted with the monetary support of a National Science Foundation Graduate Research Fellowship awarded to the primary author while he was at the Ohio State University, as well as from NSF-ITR grant #0427413, granted to Chin-Hui Lee, Mark Clements, Keith Johnson, Lawrence Rabiner and Eric Fosler-Lussier for the multi-university Automatic Speech Attribute Transcription (ASAT) project. Preliminary versions of parts of this work, in particular Simulation 1, appear in the primary author's (unpublished) dissertation.