a1 Donders Institute for Brain, Cognition and Behaviour, Donders Centre for Cognition, Radboud University Nijmegen, The Netherlands
Abstract
Researchers on bilingual processing can benefit from computational tools developed in artificial intelligence. We show that a normalized Levenshtein distance function can efficiently and reliably simulate bilingual orthographic similarity ratings. Orthographic similarity distributions of cognates and non-cognates were identified across pairs of six European languages: English, German, French, Spanish, Italian, and Dutch. Semantic equivalence was determined using the conceptual structure of a translation database. By using a similarity threshold, large numbers of cognates could be selected that nearly completely included the stimulus materials of experimental studies. The identified numbers of form-similar and identical cognates correlated highly with branch lengths of phylogenetic language family trees, supporting the usefulness of the new measure for cross-language comparison. The normalized Levenshtein distance function can be considered as a new formal model of cross-language orthographic similarity.
(Received April 27 2010)
(Revised October 22 2010)
(Accepted November 05 2010)
(Online publication August 11 2011)
Correspondence:
c1 Address for correspondence: Job Schepens/Ton Dijkstra, Donders Centre for Cognition, Radboud University Nijmegen, P.O. Box 9104, 6500 HE Nijmegen, The Netherlands j.schepens@let.ru.nl
Footnotes
* In our study, we used the standard input–output functions of the following translation database: Euroglot professional 5.0 (2008), developed by Linguistic Systems B.V. We are grateful to Walter van Heuven, Gerard Kempen, Frank Leoné, Steven Rekké, Bastiaan du Pau, and two anonymous reviewers for their thoughtful comments on an earlier version of this paper.