Hostname: page-component-8448b6f56d-tj2md Total loading time: 0 Render date: 2024-04-20T11:20:33.869Z Has data issue: false hasContentIssue false

WordICA—emergence of linguistic representations for words by independent component analysis

Published online by Cambridge University Press:  15 June 2010

TIMO HONKELA
Affiliation:
Adaptive Informatics Research Centre, Aalto University School of Science and Technology, P.O. Box 15400, FI-00076 Aalto, Finland e-mail: timo.honkela@tkk.fi
AAPO HYVÄRINEN
Affiliation:
Department of Mathematics and Statistics, Department of Computer Science, University of Helsinki, P.O. Box 68, FI-00014 University of Helsinki, Finland Helsinki Institute for Information Technology, University of Helsinki, P.O. Box 68, FI-00014 University of Helsinki, Finland
JAAKKO J. VÄYRYNEN
Affiliation:
Adaptive Informatics Research Centre, Aalto University School of Science and Technology, P.O. Box 15400, FI-00076 Aalto, Finland

Abstract

We explore the use of independent component analysis (ICA) for the automatic extraction of linguistic roles or features of words. The extraction is based on the unsupervised analysis of text corpora. We contrast ICA with singular value decomposition (SVD), widely used in statistical text analysis, in general, and specifically in latent semantic analysis (LSA). However, the representations found using the SVD analysis cannot easily be interpreted by humans. In contrast, ICA applied on word context data gives distinct features which reflect linguistic categories. In this paper, we provide justification for our approach called WordICA, present the WordICA method in detail, compare the obtained results with traditional linguistic categories and with the results achieved using an SVD-based method, and discuss the use of the method in practical natural language engineering solutions such as machine translation systems. As the WordICA method is based on unsupervised learning and thus provides a general means for efficient knowledge acquisition, we foresee that the approach has a clear potential for practical applications.

Type
Papers
Copyright
Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bazzi, I., and Glass, J. R. 2000. Modeling out-of-vocabulary words for robust speech recognition. In Proceedings of the 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, pp. 401404. Beijing, China: Chinese Friendship Publishers.Google Scholar
Bingham, E., Kabán, A., and Girolami, M. 2001. Finding topics in dynamical text: application to chat line discussions. In Poster Proceedings of the 10th International World Wide Web Conference (WWW10), pp. 198199. Hong Kong: The Chinese University of Hong Kong.Google Scholar
Bingham, E., Kuusisto, J., and Lagus, K. 2002. ICA and SOM in text document analysis. In Proceedings of the 25th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 361362. New York: Association for Computing Machinery.Google Scholar
Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent dirichlet allocation. Journal of Machine Learning Research 3: 9931022. ISSN .Google Scholar
Borschbach, M., and Pyka, M. 2007. Specific circumstances on the ability of linguistic feature extraction based on context preprocessing by ICA. In Proceedings of ICA 2007, the 7th Conference on Independent Component Analysis and Signal Separation, pp. 689696. Lecture Notes in Computer Science, vol. 4666. Heidelberg, Germany: Springer.Google Scholar
Brants, T. 2000. TnT: a statistical part-of-speech tagger. In Proceedings of the 6th conference on Applied Natural Language Processing (ANLP-2000), pp. 224231. San Francisco, CA: Morgan Kaufmann.CrossRefGoogle Scholar
Brill, E. 1992. A simple rule-based part of speech tagger. In HLT '91: Proceedings of the Workshop on Speech and Natural Language, pp. 112116. Morristown, NJ: ACL.CrossRefGoogle Scholar
Buntine, W., and Jakulin, A. 2004. Applying discrete PCA in data analysis. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 5966. San Mateo, CA: Morgan Kaufmann.Google Scholar
Callison-Burch, C., Fordyce, C., Koehn, P., Monz, C., and Schroeder, J. 2008. Further meta-evaluation of machine translation. In Proceedings of the third Workshop on Statistical Machine Translation, pp. 70106. Stroudsburg, PA: ACL.Google Scholar
Choi, F. Y. Y., Wiemer-Hastings, P., and Moore, J. 2001. Latent semantic analysis for text segmentation. In Proceedings of the Second Conference of the North American chapter of the Association for Computational Linguistics (NAACL'01), pp. 109117. Morristown, NJ: ACL.Google Scholar
Chomsky, N. 1975. The Logical Structure of Linguistic Theory. Chicago: The University of Chicago Press.Google Scholar
Church, K. W. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the 2nd Conference on Applied Natural Language Processing, pp. 136143. Morristown, NJ: ACL.CrossRefGoogle Scholar
Church, K. W., and Hanks, P. 1990. Word association norms, mutual information and lexicography. Computational Linguistics 16: 2229.Google Scholar
Clark, A. 2000. Inducing syntactic categories by context distribution clustering. In Proceedings of the Fourth Conference on Computational Language Learning (CoNLL-2000), pp. 9194. New Brunswick, NJ: ACL.Google Scholar
Clark, A. 2001. Unsupervised Language Acquisition: Theory and Practice. PhD thesis. Falmer, East Sussex, UK: University of Sussex.Google Scholar
Comon, P. 1994. Independent component analysis—a new concept? Signal Processing 36: 287314.CrossRefGoogle Scholar
Creutz, M., and Lagus, K. 2007. Unsupervised models for morpheme segmentation and morphology learning. ACM Transactions on Speech and Language Processing 4 (1): 134.CrossRefGoogle Scholar
Croft, W., and Cruse, D. A. 2004. Cognitive Linguistics. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Davidson, D. 2001. Inquiries Into Truth and Interpretation. Oxford, UK: Oxford University Press.CrossRefGoogle Scholar
Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., and Harshman, R. A. 1990. Indexing by latent semantic analysis. Journal of the American Society of Information Science 41: 391407.3.0.CO;2-9>CrossRefGoogle Scholar
Dumais, S. T., Letsche, T. A., Littman, M. L., and Landauer, T. K. 1997. Automatic cross-language retrieval using latent semantic indexing. In AAAI Symposium on Cross-Language Text and Speech Retrieval. New York: AAAI.Google Scholar
Fillmore, Ch. J. 1968. The case for case. In Bach, E. and Harms, R. (eds.), Universals in Linguistic Theory, pp. 188. New York: Holt, Rinehart and Winston, Inc.Google Scholar
Finch, S., and Chater, N. 1992. Unsupervised methods for finding linguistic categories. In Aleksander, I. and Taylor, J. (eds.), Artificial Neural Networks, 2, pp. II–13651368. Amsterdam: North-Holland.CrossRefGoogle Scholar
Foltz, P., Kintsch, W., and Landauer, T. 1998. The measurement of textual coherence with latent semantic analysis. Discourse Processes 25 (2–3): 285307.CrossRefGoogle Scholar
Francis, W. N., and Kucera, H. 1964. Brown Corpus Manual: Manual of Information to Accompany a Standard Corpus of Present Day Edited American English. Providence, RI: Brown University.Google Scholar
Furnas, G. W., Landauer, T. K., Gomez, L. M., and Dumais, S. T. 1987. The vocabulary problem in human-system communication. Communications of the ACM 30 (11): 964971.CrossRefGoogle Scholar
Haghighi, A. and Klein, D. 2006a. Prototype-driven grammar induction. In ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp. 881888. Morristown, NJ: ACL.Google Scholar
Haghighi, A., and Klein, D. 2006b. Prototype-driven learning for sequence models. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL'06), pp. 320327. Morristown, NJ: ACL.Google Scholar
Hansen, L. K., Ahrendt, P., and Larsen, J. 2005. Towards cognitive component analysis. In Proceedings of AKRR'05, International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning, pp. 148153. Espoo, Finland: Laboratory of Computer and Information Science, Helsinki University of Technology.Google Scholar
Haykin, S. 1999. Neural Networks. A Comprehensive Foundation. Upper Saddle River, NJ: Prentice Hall.Google Scholar
Hinton, G., and Sejnowski, T J. (eds.) 1999. Unsupervised Learning. Scituate, MA: Bradford Company.CrossRefGoogle Scholar
Hirsimäki, T., Creutz, M., Siivola, V., Kurimo, M., Virpioja, S., and Pylkkönen, J. 2006. Unlimited vocabulary speech recognition with morph language models applied to Finnish. Computer Speech and Language 30 (4): 515541.CrossRefGoogle Scholar
Honkela, T., Hyvärinen, A., and Väyrynen, J. 2005. Emergence of linguistic features: independent component analysis of contexts. In Proceedings of NCPW9, Neural Computation and Psychology Workshop, pp. 129138. Singapore: World Scientific.Google Scholar
Honkela, T., Pulkki, V., and Kohonen, T. 1995. Contextual relations of words in Grimm tales analyzed by self-organizing map. In Proceedings of ICANN-95, International Conference on Artificial Neural Networks, vol. 2, pp. 37. Paris: EC2 et Cie.Google Scholar
Hopper, P. 1987. Emergent grammar. Berkeley Linguistics Society 13: 139157.Google Scholar
Hurri, J., Gävert, H., Särelä, J., and Hyvärinen, A. 2002. FastICA software package. Technical report, Laboratory of Computer and Information Science, Helsinki University of Technology, Espoo, Finland.Google Scholar
Hyvärinen, A. 1999. Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks 10 (3): 626634.CrossRefGoogle ScholarPubMed
Hyvärinen, A., Hurri, J., and Hoyer, P. O. (2009). Natural Image Statistics: A Probabilistic Approach to Early Computational Vision. New York: Springer-Verlag.CrossRefGoogle Scholar
Hyvärinen, A., Karhunen, J., and Oja, E. 2001. Independent Component Analysis. New York: John Wiley & Sons.CrossRefGoogle ScholarPubMed
Johnson, M. 2007. Why doesn't EM find good HMM POS-taggers. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007), pp. 296305. Stroudsburg, PA: ACL.Google Scholar
Jones, M. N., and Mewhort, D. J. K. 2007. Representing word meaning and order information in a composite Holographic Lexicon. Psychological Review 114 (1): 137.CrossRefGoogle Scholar
Jutten, C., and Hérault, J. 1991. Blind separation of sources. Part I. An adaptive algorithm based on neuromimetic architecture. Signal Processing 24: 110.CrossRefGoogle Scholar
Kanaan, G., al Shalabi, R., and Sawalha, M. 2005. Improving Arabic information retrieval systems using part of speech tagging. Information Technology Journal 4 (1): 3237.CrossRefGoogle Scholar
Koehn, P., and Hoang, H. 2007. Factored translation models. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007), pp. 868876. Stroudsburg, PA: ACL.Google Scholar
Kolenda, T., Hansen, L. K., and Sigurdsson, S. 2000. Independent components in text. In Advances in Independent Component Analysis, pp. 229250. London: Springer-Verlag.Google Scholar
Landauer, T. K., and Dumais, S. T. (1997). A solution to Plato's problem: the latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104: 211240.CrossRefGoogle Scholar
Levy, J. P., Bullinaria, J. A., and Patel, M. 1998. Explorations in the derivation of semantic representations from word co-occurrence statistics. South Pacific Journal of Psychology 10: 99111.CrossRefGoogle Scholar
Lund, K., Burgess, C., and Audet, C. 1996. Dissociating semantic and associative relationships using high-dimensional semantic space. In Proceedings of the 18th Annual Conference of the Cognitive Science Society, pp. 603608. Austin, TX: Cognitive Science Society.Google Scholar
Manning, C., and Schütze, H. 1999. Foundations Of Statistical Natural Language Processing. Cambridge, MA: MIT Press.Google Scholar
McKeown, M., Makeig, S., Brown, S., Jung, T.-P., Kindermann, S., Bell, A. J., Iragui, V., and Sejnowski, T. 1998. Blind separation of functional magnetic resonance imaging (fMRI) data. Human Brain Mapping 6 (5–6): 368372.3.0.CO;2-E>CrossRefGoogle Scholar
Merialdo, B. 1994. Tagging English text with a probabilistic model. Computational Linguistics 20: 155171.Google Scholar
Niu, C., Li, W., Srihari, R. K., Li, H., and Crist, L. 2004. Context clustering for word sense disambiguation based on modeling pairwise context similarities. In Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pp. 187190. Morristown, NJ: ACL.Google Scholar
Och, F. J. 1999. An efficient method for determining bilingual word classes. In Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics, pp. 7176. Morristown, NJ: ACL.CrossRefGoogle Scholar
Oja, E. 2004. Finding clusters and components by unsupervised learning. In Proceedings of the Joint IAPR International Workshops, SSPR 2004 and SPR 2004, pp. 115. Berlin: Springer.Google Scholar
Ritter, H. and Kohonen, T. 1989. Self-organizing semantic maps. Biological Cybernetics 61 (4): 241254.CrossRefGoogle Scholar
Sahlgren, M. 2006. The Word-Space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations Between Words in High-Dimensional Vector Spaces. PhD thesis, Computational Linguistics, Stockholm University.Google Scholar
Sahlgren, M., Holst, A., and Kanerva, P. 2008. Permutations as a means to encode order in word space. In Proceedings of the 30th Annual Conference of the Cognitive Science Society, CogSci'08, pp. 13001305. Austin, TX: Cognitive Science Society.Google Scholar
Salton, G., Wong, A., and Yang, C. S. 1975. A vector space model for automatic indexing. Communications of the ACM 18 (11): 613620.CrossRefGoogle Scholar
Schütze, H. 1992. Dimensions of meaning. In Proceedings of Supercomputing, pp. 787796. Minneapolis, MN: IEEE Computer Society Press.CrossRefGoogle Scholar
Schütze, H. 1995. Distributional part-of-speech tagging. In Proceedings of the 7th Conference on European ACL, pp. 141148. San Francisco, CA: Morgan Kaufmann Publishers.Google Scholar
Steinberger, J., Kabadjov, M. A., Poesio, M., and Sanchez-Graillet, O. 2005. Improving LSA-based summarization with anaphora resolution. In Proceedings of HLT'05: Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 18. Morristown, NJ: ACL.Google Scholar
Tarski, A. 1983. The concept of truth in formalized languages. In Bach, E. and Harms, R. (eds.), Logic, Semantics and Metamathematics, pp. 152278. Indianapolis, IN: Hackett.Google Scholar
Ueffing, N., and Ney, H. 2003. Using POS information for statistical machine translation into morphologically rich languages. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL-03), pp. 347354. Morristown, NJ: ACL.Google Scholar
Väyrynen, J., and Honkela, T. 2005. Comparison of independent component analysis and singular value decomposition in word context analysis. In Proceedings of AKRR'05, International and Interdisciplinary Conference on Adaptive Knowledge Representation and Reasoning, pp. 135140. Espoo, Finland: Laboratory of Computer and Information Science, Helsinki University of Technology.Google Scholar
Väyrynen, J. J., Honkela, T., and Hyvärinen, A. 2004. Independent component analysis of word contexts and comparison with traditional categories. In Proceedings of NORSIG 2004, the 6th Nordic Signal Processing Symposium, pp. 300303. Espoo, Finland: Signal Processing Laboratory, Helsinki University of Technology.Google Scholar
Väyrynen, J. J., and Lindh-Knuutila, T. 2006. Emergence of multilingual representations by independent component analysis using parallel corpora. In Proceedings of SCAI'06, Scandinavian Conference on Artificial Intelligence, pp. 101105. Espoo, Finland: Finnish Artificial Intelligence Society.Google Scholar
Vicente, A., Hoyer, P. O., and Hyvärinen, A. 2007. Equivalence of some common linear feature extraction techniques for appearance-based object recognition tasks. IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (5): 896900.CrossRefGoogle Scholar
Wang, Q. I., and Schuurmans, D. 2005. Improved estimation for unsupervised part-of-speech tagging. In Proceedings of the IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'09), pp. 219224. Beijing, China: BUPT.Google Scholar
Wilks, Y. and Stevenson, M. 1998. The grammar of sense: using part-of-speech tags as a first step in semantic disambiguation. Natural Language Engineering 4 (2): 135143. ISSN 1351–3249.CrossRefGoogle Scholar
Yu, C., Ballard, D. H., and Asli, R. N. 2003. The role of embodied intention in early lexical acquisition. In Proceedings of the 25th Annual Meeting of Cognitive Science Society (CogSci 2003), pp. 12931298. Austin, TX: Cognitive Science Society.Google Scholar