Applied Psycholinguistics



Articles

The use of film subtitles to estimate word frequencies


BORIS NEW a1c1, MARC BRYSBAERT a2, JEAN VERONIS a3 and CHRISTOPHE PALLIER a4
a1 Université Paris Descartes and CNRS
a2 Royal Holloway, University of London
a3 Université de Provence
a4 CNRS, INSERM, and Service Hospitalier Frédéric Joliot

Abstract

We examine the use of film subtitles as an approximation of word frequencies in human interactions. Because subtitle files are widely available on the Internet, they may present a fast and easy way to obtain word frequency measures in language registers other than text writing. We compiled a corpus of 52 million French words, coming from a variety of films. Frequency measures based on this corpus compared well to other spoken and written frequency measures, and explained variance in lexical decision times in addition to what is accounted for by the available French written frequency measures.

(Received April 3 2006)
(Accepted January 18 2007)


Correspondence:
c1 Boris New, 71 Avenue Edouard Vaillant, Boulogne-Billancourt F-92100, France. E-mail: boris.new@univ-paris5.fr


Metrics