WordICA—emergence of linguistic representations for words by independent component analysis

TIMO HONKELA; AAPO HYVÄRINEN; JAAKKO J. VÄYRYNEN

doi:10.1017/S1351324910000057

WordICA—emergence of linguistic representations for words by independent component analysis

Published online by Cambridge University Press: 15 June 2010

TIMO HONKELA ,

AAPO HYVÄRINEN and

JAAKKO J. VÄYRYNEN

Show author details

TIMO HONKELA: Affiliation:
Adaptive Informatics Research Centre, Aalto University School of Science and Technology, P.O. Box 15400, FI-00076 Aalto, Finland e-mail: timo.honkela@tkk.fi
AAPO HYVÄRINEN: Affiliation:
Department of Mathematics and Statistics, Department of Computer Science, University of Helsinki, P.O. Box 68, FI-00014 University of Helsinki, Finland Helsinki Institute for Information Technology, University of Helsinki, P.O. Box 68, FI-00014 University of Helsinki, Finland
JAAKKO J. VÄYRYNEN: Affiliation:
Adaptive Informatics Research Centre, Aalto University School of Science and Technology, P.O. Box 15400, FI-00076 Aalto, Finland

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

We explore the use of independent component analysis (ICA) for the automatic extraction of linguistic roles or features of words. The extraction is based on the unsupervised analysis of text corpora. We contrast ICA with singular value decomposition (SVD), widely used in statistical text analysis, in general, and specifically in latent semantic analysis (LSA). However, the representations found using the SVD analysis cannot easily be interpreted by humans. In contrast, ICA applied on word context data gives distinct features which reflect linguistic categories. In this paper, we provide justification for our approach called WordICA, present the WordICA method in detail, compare the obtained results with traditional linguistic categories and with the results achieved using an SVD-based method, and discuss the use of the method in practical natural language engineering solutions such as machine translation systems. As the WordICA method is based on unsupervised learning and thus provides a general means for efficient knowledge acquisition, we foresee that the approach has a clear potential for practical applications.

Type: Papers
Information: Natural Language Engineering , Volume 16 , Issue 3 , July 2010 , pp. 277 - 308

DOI: https://doi.org/10.1017/S1351324910000057 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bazzi, I., and Glass, J. R. 2000. Modeling out-of-vocabulary words for robust speech recognition. In Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, pp. 401–404. Beijing, China: Chinese Friendship Publishers.Google Scholar

Bingham, E., Kabán, A., and Girolami, M. 2001. Finding topics in dynamical text: application to chat line discussions. In Poster Proceedings of the 10th International World Wide Web Conference (WWW10), pp. 198–199. Hong Kong: The Chinese University of Hong Kong.Google Scholar

Bingham, E., Kuusisto, J., and Lagus, K. 2002. ICA and SOM in text document analysis. In Proceedings of the 25th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 361–362. New York: Association for Computing Machinery.Google Scholar

Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3: 993–1022. ISSN .Google Scholar

Borschbach, M., and Pyka, M. 2007. Specific circumstances on the ability of linguistic feature extraction based on context preprocessing by ICA. In Proceedings of ICA 2007, the 7th Conference on Independent Component Analysis and Signal Separation, pp. 689–696. Lecture Notes in Computer Science, vol. 4666. Heidelberg, Germany: Springer.Google Scholar

Brants, T. 2000. TnT: a statistical part-of-speech tagger. In Proceedings of the 6th conference on Applied Natural Language Processing (ANLP-2000), pp. 224–231. San Francisco, CA: Morgan Kaufmann.CrossRef Google Scholar

Brill, E. 1992. A simple rule-based part of speech tagger. In HLT '91: Proceedings of the Workshop on Speech and Natural Language, pp. 112–116. Morristown, NJ: ACL.CrossRef Google Scholar

Buntine, W., and Jakulin, A. 2004. Applying discrete PCA in data analysis. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 59–66. San Mateo, CA: Morgan Kaufmann.Google Scholar

Callison-Burch, C., Fordyce, C., Koehn, P., Monz, C., and Schroeder, J. 2008. Further meta-evaluation of machine translation. In Proceedings of the third Workshop on Statistical Machine Translation, pp. 70–106. Stroudsburg, PA: ACL.Google Scholar

Choi, F. Y. Y., Wiemer-Hastings, P., and Moore, J. 2001. Latent semantic analysis for text segmentation. In Proceedings of the Second Conference of the North American chapter of the Association for Computational Linguistics (NAACL'01), pp. 109–117. Morristown, NJ: ACL.Google Scholar

Chomsky, N. 1975. The Logical Structure of Linguistic Theory. Chicago: The University of Chicago Press.Google Scholar

Church, K. W. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the 2nd Conference on Applied Natural Language Processing, pp. 136–143. Morristown, NJ: ACL.CrossRef Google Scholar

Church, K. W., and Hanks, P. 1990. Word association norms, mutual information and lexicography. Computational Linguistics 16: 22–29.Google Scholar

Clark, A. 2000. Inducing syntactic categories by context distribution clustering. In Proceedings of the Fourth Conference on Computational Language Learning (CoNLL-2000), pp. 91–94. New Brunswick, NJ: ACL.Google Scholar

Clark, A. 2001. Unsupervised Language Acquisition: Theory and Practice. PhD thesis. Falmer, East Sussex, UK: University of Sussex.Google Scholar

Comon, P. 1994. Independent component analysis—a new concept? Signal Processing 36: 287–314.CrossRef Google Scholar

Creutz, M., and Lagus, K. 2007. Unsupervised models for morpheme segmentation and morphology learning. ACM Transactions on Speech and Language Processing 4 (1): 1–34.CrossRef Google Scholar

Croft, W., and Cruse, D. A. 2004. Cognitive Linguistics. Cambridge, UK: Cambridge University Press.CrossRef Google Scholar

Davidson, D. 2001. Inquiries Into Truth and Interpretation. Oxford, UK: Oxford University Press.CrossRef Google Scholar

Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., and Harshman, R. A. 1990. Indexing by latent semantic analysis. Journal of the American Society of Information Science 41: 391–407.3.0.CO;2-9>CrossRef Google Scholar

Dumais, S. T., Letsche, T. A., Littman, M. L., and Landauer, T. K. 1997. Automatic cross-language retrieval using latent semantic indexing. In AAAI Symposium on Cross-Language Text and Speech Retrieval. New York: AAAI.Google Scholar

Fillmore, Ch. J. 1968. The case for case. In Bach, E. and Harms, R. (eds.), Universals in Linguistic Theory, pp. 1–88. New York: Holt, Rinehart and Winston, Inc.Google Scholar

Finch, S., and Chater, N. 1992. Unsupervised methods for finding linguistic categories. In Aleksander, I. and Taylor, J. (eds.), Artificial Neural Networks, 2, pp. II–1365–1368. Amsterdam: North-Holland.CrossRef Google Scholar

Foltz, P., Kintsch, W., and Landauer, T. 1998. The measurement of textual coherence with latent semantic analysis. Discourse Processes 25 (2–3): 285–307.CrossRef Google Scholar

Francis, W. N., and Kucera, H. 1964. Brown Corpus Manual: Manual of Information to Accompany a Standard Corpus of Present Day Edited American English. Providence, RI: Brown University.Google Scholar

Furnas, G. W., Landauer, T. K., Gomez, L. M., and Dumais, S. T. 1987. The vocabulary problem in human-system communication. Communications of the ACM 30 (11): 964–971.CrossRef Google Scholar

Haghighi, A. and Klein, D. 2006a. Prototype-driven grammar induction. In ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp. 881–888. Morristown, NJ: ACL.Google Scholar

Haghighi, A., and Klein, D. 2006b. Prototype-driven learning for sequence models. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL'06), pp. 320–327. Morristown, NJ: ACL.Google Scholar

Hansen, L. K., Ahrendt, P., and Larsen, J. 2005. Towards cognitive component analysis. In Proceedings of AKRR'05, International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning, pp. 148–153. Espoo, Finland: Laboratory of Computer and Information Science, Helsinki University of Technology.Google Scholar

Haykin, S. 1999. Neural Networks. A Comprehensive Foundation. Upper Saddle River, NJ: Prentice Hall.Google Scholar

Hinton, G., and Sejnowski, T J. (eds.) 1999. Unsupervised Learning. Scituate, MA: Bradford Company.CrossRef Google Scholar

Hirsimäki, T., Creutz, M., Siivola, V., Kurimo, M., Virpioja, S., and Pylkkönen, J. 2006. Unlimited vocabulary speech recognition with morph language models applied to Finnish. Computer Speech and Language 30 (4): 515–541.CrossRef Google Scholar

Honkela, T., Hyvärinen, A., and Väyrynen, J. 2005. Emergence of linguistic features: independent component analysis of contexts. In Proceedings of NCPW9, Neural Computation and Psychology Workshop, pp. 129–138. Singapore: World Scientific.Google Scholar

Honkela, T., Pulkki, V., and Kohonen, T. 1995. Contextual relations of words in Grimm tales analyzed by self-organizing map. In Proceedings of ICANN-95, International Conference on Artificial Neural Networks, vol. 2, pp. 3–7. Paris: EC2 et Cie.Google Scholar

Hopper, P. 1987. Emergent grammar. Berkeley Linguistics Society 13: 139–157.Google Scholar

Hurri, J., Gävert, H., Särelä, J., and Hyvärinen, A. 2002. FastICA software package. Technical report, Laboratory of Computer and Information Science, Helsinki University of Technology, Espoo, Finland.Google Scholar

Hyvärinen, A. 1999. Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks 10 (3): 626–634.CrossRef Google Scholar PubMed

Hyvärinen, A., Hurri, J., and Hoyer, P. O. (2009). Natural Image Statistics: A Probabilistic Approach to Early Computational Vision. New York: Springer-Verlag.CrossRef Google Scholar

Hyvärinen, A., Karhunen, J., and Oja, E. 2001. Independent Component Analysis. New York: John Wiley & Sons.CrossRef Google Scholar PubMed

Johnson, M. 2007. Why doesn't EM find good HMM POS-taggers. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007), pp. 296–305. Stroudsburg, PA: ACL.Google Scholar

Jones, M. N., and Mewhort, D. J. K. 2007. Representing word meaning and order information in a composite Holographic Lexicon. Psychological Review 114 (1): 1–37.CrossRef Google Scholar

Jutten, C., and Hérault, J. 1991. Blind separation of sources. Part I. An adaptive algorithm based on neuromimetic architecture. Signal Processing 24: 1–10.CrossRef Google Scholar

Kanaan, G., al Shalabi, R., and Sawalha, M. 2005. Improving Arabic information retrieval systems using part of speech tagging. Information Technology Journal 4 (1): 32–37.CrossRef Google Scholar

Koehn, P., and Hoang, H. 2007. Factored translation models. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007), pp. 868–876. Stroudsburg, PA: ACL.Google Scholar

Kolenda, T., Hansen, L. K., and Sigurdsson, S. 2000. Independent components in text. In Advances in Independent Component Analysis, pp. 229–250. London: Springer-Verlag.Google Scholar

Landauer, T. K., and Dumais, S. T. (1997). A solution to Plato's problem: the latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104: 211–240.CrossRef Google Scholar

Levy, J. P., Bullinaria, J. A., and Patel, M. 1998. Explorations in the derivation of semantic representations from word co-occurrence statistics. South Pacific Journal of Psychology 10: 99–111.CrossRef Google Scholar

Lund, K., Burgess, C., and Audet, C. 1996. Dissociating semantic and associative relationships using high-dimensional semantic space. In Proceedings of the 18th Annual Conference of the Cognitive Science Society, pp. 603–608. Austin, TX: Cognitive Science Society.Google Scholar

Manning, C., and Schütze, H. 1999. Foundations Of Statistical Natural Language Processing. Cambridge, MA: MIT Press.Google Scholar

McKeown, M., Makeig, S., Brown, S., Jung, T.-P., Kindermann, S., Bell, A. J., Iragui, V., and Sejnowski, T. 1998. Blind separation of functional magnetic resonance imaging (fMRI) data. Human Brain Mapping 6 (5–6): 368–372.3.0.CO;2-E>CrossRef Google Scholar

Merialdo, B. 1994. Tagging English text with a probabilistic model. Computational Linguistics 20: 155–171.Google Scholar

Niu, C., Li, W., Srihari, R. K., Li, H., and Crist, L. 2004. Context clustering for word sense disambiguation based on modeling pairwise context similarities. In Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pp. 187–190. Morristown, NJ: ACL.Google Scholar

Och, F. J. 1999. An efficient method for determining bilingual word classes. In Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics, pp. 71–76. Morristown, NJ: ACL.CrossRef Google Scholar

Oja, E. 2004. Finding clusters and components by unsupervised learning. In Proceedings of the Joint IAPR International Workshops, SSPR 2004 and SPR 2004, pp. 1–15. Berlin: Springer.Google Scholar

Ritter, H. and Kohonen, T. 1989. Self-organizing semantic maps. Biological Cybernetics 61 (4): 241–254.CrossRef Google Scholar

Sahlgren, M. 2006. The Word-Space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations Between Words in High-Dimensional Vector Spaces. PhD thesis, Computational Linguistics, Stockholm University.Google Scholar

Sahlgren, M., Holst, A., and Kanerva, P. 2008. Permutations as a means to encode order in word space. In Proceedings of the 30th Annual Conference of the Cognitive Science Society, CogSci'08, pp. 1300–1305. Austin, TX: Cognitive Science Society.Google Scholar

Salton, G., Wong, A., and Yang, C. S. 1975. A vector space model for automatic indexing. Communications of the ACM 18 (11): 613–620.CrossRef Google Scholar

Schütze, H. 1992. Dimensions of meaning. In Proceedings of Supercomputing, pp. 787–796. Minneapolis, MN: IEEE Computer Society Press.CrossRef Google Scholar

Schütze, H. 1995. Distributional part-of-speech tagging. In Proceedings of the 7th Conference on European ACL, pp. 141–148. San Francisco, CA: Morgan Kaufmann Publishers.Google Scholar

Steinberger, J., Kabadjov, M. A., Poesio, M., and Sanchez-Graillet, O. 2005. Improving LSA-based summarization with anaphora resolution. In Proceedings of HLT'05: Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 1–8. Morristown, NJ: ACL.Google Scholar

Tarski, A. 1983. The concept of truth in formalized languages. In Bach, E. and Harms, R. (eds.), Logic, Semantics and Metamathematics, pp. 152–278. Indianapolis, IN: Hackett.Google Scholar

Ueffing, N., and Ney, H. 2003. Using POS information for statistical machine translation into morphologically rich languages. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL-03), pp. 347–354. Morristown, NJ: ACL.Google Scholar

Väyrynen, J., and Honkela, T. 2005. Comparison of independent component analysis and singular value decomposition in word context analysis. In Proceedings of AKRR'05, International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning, pp. 135–140. Espoo, Finland: Laboratory of Computer and Information Science, Helsinki University of Technology.Google Scholar

Väyrynen, J. J., Honkela, T., and Hyvärinen, A. 2004. Independent component analysis of word contexts and comparison with traditional categories. In Proceedings of NORSIG 2004, the 6th Nordic Signal Processing Symposium, pp. 300–303. Espoo, Finland: Signal Processing Laboratory, Helsinki University of Technology.Google Scholar

Väyrynen, J. J., and Lindh-Knuutila, T. 2006. Emergence of multilingual representations by independent component analysis using parallel corpora. In Proceedings of SCAI'06, Scandinavian Conference on Artificial Intelligence, pp. 101–105. Espoo, Finland: Finnish Artificial Intelligence Society.Google Scholar

Vicente, A., Hoyer, P. O., and Hyvärinen, A. 2007. Equivalence of some common linear feature extraction techniques for appearance-based object recognition tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (5): 896–900.CrossRef Google Scholar

Wang, Q. I., and Schuurmans, D. 2005. Improved estimation for unsupervised part-of-speech tagging. In Proceedings of the IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'09), pp. 219–224. Beijing, China: BUPT.Google Scholar

Wilks, Y. and Stevenson, M. 1998. The grammar of sense: using part-of-speech tags as a first step in semantic disambiguation. Natural Language Engineering 4 (2): 135–143. ISSN 1351–3249.CrossRef Google Scholar

Yu, C., Ballard, D. H., and Asli, R. N. 2003. The role of embodied intention in early lexical acquisition. In Proceedings of the 25th Annual Meeting of Cognitive Science Society (CogSci 2003), pp. 1293–1298. Austin, TX: Cognitive Science Society.Google Scholar

Article contents

WordICA—emergence of linguistic representations for words by independent component analysis

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests