Automatic bilingual lexicon acquisition using random indexing of parallel corpora

M. SAHLGREN; J. KARLGREN

doi:10.1017/S1351324905003876

Abstract

This paper presents a very simple and effective approach to using parallel corpora for automatic bilingual lexicon acquisition. The approach, which uses the Random Indexing vector space methodology, is based on finding correlations between terms based on their distributional characteristics. The approach requires a minimum of preprocessing and linguistic knowledge, and is efficient, fast and scalable. In this paper, we explain how our approach differs from traditional cooccurrence-based word alignment algorithms, and we demonstrate how to extract bilingual lexica using the Random Indexing approach applied to aligned parallel data. The acquired lexica are evaluated by comparing them to manually compiled gold standards, and we report overlap of around 60%. We also discuss methodological problems with evaluating lexical resources of this kind.

Crossref Citations

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Amsalu, Saba 2006. Advances in Natural Language Processing. Vol. 4139, Issue. , p. 470.

Karlgren, Jussi and Olsson, Fredrik 2007. Evaluation of Multilingual and Multi-modal Information Retrieval. Vol. 4730, Issue. , p. 217.

Karlgren, Jussi Gonzalo, Julio and Clough, Paul 2007. Evaluation of Multilingual and Multi-modal Information Retrieval. Vol. 4730, Issue. , p. 186.

Sahlgren, Magnus and Karlgren, Jussi 2008. Intelligence and Security Informatics. Vol. 5376, Issue. , p. 73.

Supraja, J 2009. A spatial approach to perception identification in editorials enhanced with anaphora resolution. p. 421.

Aires, José Lopes, Gabriel Pereira and Gomes, Luis 2009. Progress in Artificial Intelligence. Vol. 5816, Issue. , p. 587.

Karlgren, Jussi and Gonzalo, Julio 2010. ImageCLEF. Vol. 32, Issue. , p. 117.

Carrillo, Maya and López-López, Aurelio 2010. Artificial Intelligence Applications and Innovations. Vol. 339, Issue. , p. 154.

Carrillo, Maya Villatoro-Tello, Esaú López-López, Aurelio Eliasmith, Chris Villaseñor-Pineda, Luis and Montes-y-Gómez, Manuel 2010. Advances in Natural Language Processing. Vol. 6233, Issue. , p. 85.

Prud'hommeaux, Emily T. and Roark, Brian 2011. Alignment of spoken narratives for automated neuropsychological assessment. p. 484.

Tait, John I. and Diallo, Barou 2011. Current Challenges in Patent Information Retrieval. Vol. 29, Issue. , p. 389.

Hassel, Martin and Dalianis, Hercules 2012. Applied Natural Language Processing. p. 17.

VIRPIOJA, SAMI PAUKKERI, MARI-SANNA TRIPATHI, ABHISHEK LINDH-KNUUTILA, TIINA and LAGUS, KRISTA 2012. Evaluating vector space models with canonical correlation analysis. Natural Language Engineering, Vol. 18, Issue. 3, p. 399.

Wan, Miao Jönsson, Arne Wang, Cong Li, Lixiang and Yang, Yixian 2012. Web user clustering and Web prefetching using Random Indexing with weight functions. Knowledge and Information Systems, Vol. 33, Issue. 1, p. 89.

Wan, Miao Jönsson, Arne Wang, Cong Li, Lixiang and Yang, Yixian 2012. New Frontiers in Applied Data Mining. Vol. 7104, Issue. , p. 40.

Chen, Xingyuan Yang, Xia and Su, Bingjun 2013. A Fast Algorithm of Computing Word Similarity. p. 405.

Moen, Hans and Marsi, Erwin 2013. Statistical Language and Speech Processing. Vol. 7978, Issue. , p. 164.

Prasath, Rajendra Sarkar, Sudeshna and O’Reilly, Philip 2014. Human-Inspired Computing and Its Applications. Vol. 8856, Issue. , p. 104.

Kim, Jae-Hoon Kwon, Hong-Seok and Seo, Hyeong-Won 2015. Evaluating a Pivot-Based Approach for Bilingual Lexicon Extraction. Computational Intelligence and Neuroscience, Vol. 2015, Issue. , p. 1.

Wushouer, Mairidan Lin, Donghui Ishida, Toru and Hirayama, Katsutoshi 2016. A Constraint Approach to Pivot-Based Bilingual Dictionary Induction. ACM Transactions on Asian and Low-Resource Language Information Processing, Vol. 15, Issue. 1, p. 1.

Download full list

Article contents

Automatic bilingual lexicon acquisition using random indexing of parallel corpora

Abstract

Access options

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

Automatic bilingual lexicon acquisition using random indexing of parallel corpora

Abstract

Access options

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests