a1 F. C. Donders Centre for Cognitive Neuroimaging Max Planck Institute for Psycholinguistics
a2 F. C. Donders Centre for Cognitive Neuroimaging Max Planck Institute for Psycholinguistics
a3 Max Planck Institute for Psycholinguistics
In the present study, we explore whether multiple data sources may be more effective than single sources at predicting the words that language learners are likely to know. Second language researchers have hypothesized that there is a relationship between word frequency and the likelihood that words will be encountered or used by second language learners, but it is not yet clear how this relationship should be effectively measured. An analysis of word frequency measures showed that spoken language frequency alone may predict the occurrence of words in learner textbooks, but that multiple corpora as well as textbook status can improve predictions of learner usage.
(Received October 16 2006)
(Revised February 18 2007)
(Accepted April 20 2007)
* Arna van Doorn assembled the vocabulary lists from the three Dutch textbooks. The Max Planck Institute for Psycholinguistics provided access to the CELEX, the CGN, and the ESF corpora. The analysis was conducted using R (R Development Team, 2005), and the stats (R Development Team, 2005) and MASS (Venables & Ripley, 2002) libraries. This research was supported by the Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO). We would also like to thank two anonymous reviewers for useful suggestions, and Jan Hulstijn for providing helpful comments and references for textbook vocabulary selection including, in addition to those cited in the text, Hazenberg (1994) and Sciarone (1979).