Hostname: page-component-8448b6f56d-42gr6 Total loading time: 0 Render date: 2024-04-20T04:42:41.228Z Has data issue: false hasContentIssue false

Similarity computation using semantic networks created from web-harvested data

Published online by Cambridge University Press:  26 July 2013

ELIAS IOSIF
Affiliation:
Department of Electronic and Computer Engineering, Technical University of CreteChania 73100, Greece email: iosife@telecom.tuc.gr, potam@telecom.tuc.gr
ALEXANDROS POTAMIANOS
Affiliation:
Department of Electronic and Computer Engineering, Technical University of CreteChania 73100, Greece email: iosife@telecom.tuc.gr, potam@telecom.tuc.gr

Abstract

We investigate language-agnostic algorithms for the construction of unsupervised distributional semantic models using web-harvested corpora. Specifically, a corpus is created from web document snippets, and the relevant semantic similarity statistics are encoded in a semantic network. We propose the notion of semantic neighborhoods that are defined using co-occurrence or context similarity features. Three neighborhood-based similarity metrics are proposed, motivated by the hypotheses of attributional and maximum sense similarity. The proposed metrics are evaluated against human similarity ratings achieving state-of-the-art results.

Type
Articles
Copyright
Copyright © Cambridge University Press 2013 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Pasca, M., and Soroa, A., 2009. A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Boulder, CO, USA, pp. 1927.Google Scholar
Agirre, E., and Edmonds, P. (eds.), 2007. Word Sense Disambiguation: Algorithms and Applications. Secaucus, NJ: Springer-Verlag.Google Scholar
Agirre, E., Martínez, D., de Lacalle, O. L., and Soroa, A., 2006. Two graph-based algorithms for state-of-the-art WSD. In Proceedings of Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, pp. 585–93.Google Scholar
Banerjee, S., and Pedersen, T., 2002. An adapted Lesk algorithm for word sense disambiguation using wordnet. In Proceedings of Third International Conference on Intelligent Text Processing and Computational Linguistics, Mexico City, Mexico, pp. 136–45.CrossRefGoogle Scholar
Baroni, M., and Lenci, A., 2010. Distributional memory: a general framework for corpus-based semantics. Computational Linguistics 36 (4): 673721.Google Scholar
Bollegala, D., Matsuo, Y., and Ishizuka, M., 2007. Measuring semantic similarity between words using web search engines. In Proceedings of International Conference on World Wide Web, Banff, Alberta, Canada, pp. 757–66.Google Scholar
Brin, S., and Page, L., 1998. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International Conference on World Wide Web, Brisbane, Australia, pp. 107–17.Google Scholar
Budanitsky, A., and Hirst, G., 2006. Evaluating WordNet-based measures of semantic distance. Computational Linguistics 32 : 1347.Google Scholar
Caraballo, S. A., 1999. Automatic construction of a hypernym-labeled noun hierarchy from text. In Proceedings of the Annual Meeting of the Association for Computational Linguistics: HLT, College Park, MD, pp. 120–26.Google Scholar
Collins, A. M., and Loftus, E. F., 1975. A spreading-activation theory of semantic processing. Psychological Review 82 (6): 407–28.Google Scholar
Erk, K., and Padó, S., 2010. Exemplar-based models for word meaning in context. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp. 92–7.Google Scholar
Ferrer-I-Cancho, R., and Solé, R. V., 2001. The small world of human language. Proceedings of the Royal Society of London, Series B, Biological Sciences 268 : 2261–6.Google Scholar
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., and Ruppin, E., 2002. Placing search in context: the concept revisited. ACM Transactions on Information Systems 20 (1): 116–31.Google Scholar
Gracia, J., Trillo, R., Espinoza, M., and Mena, E., 2006. Querying the web: a multiontology disambiguation method. In Proceedings of International Conference on Web Engineering, Palo Alto, CA, pp. 241–8.Google Scholar
Grefenstette, G., 1994. Explorations in Automatic Thesaurus Discovery. Norwell, MA: Kluwer.Google Scholar
Harrington, B., 2010. A semantic network approach to measuring relatedness. In Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China, pp. 356–64.Google Scholar
Harris, Z., 1954. Distributional structure. Word 10 (23): 146–62.Google Scholar
Haveliwala, T., Gionis, A., Klein, D., and Indyk, P., 2002. Evaluating strategies for similarity search on the web. In Proceedings of the 11th International World Wide Web Conference, Honolulu, HI, pp. 432–42.Google Scholar
Hearst, M. A., 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of Conference on Computational Linguistics, Nantes, France, pp. 539–45.Google Scholar
Hughes, T., and Ramage, D., 2007. Lexical semantic relatedness with random graph walks. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, pp. 581–9.Google Scholar
Ide, N., and Véronis, J., 1998. Word sense disambiguation: the state of the art. Computational Linguistics 24 (1): 140.Google Scholar
Iosif, E., and Potamianos, A., 2010. Unsupervised semantic similarity computation between terms using web documents. IEEE Transactions on Knowledge and Data Engineering 22 (11): 1637–47.CrossRefGoogle Scholar
Iosif, E., and Potamianos, A., 2012. SemSim: resources for normalized semantic similarity computation using lexical networks. In Proceedings of Eighth International Conference on Language Resources and Evaluation, Istanbul, Turkey, pp. 3499–504.Google Scholar
Iosif, E., and Potamianos, A. 2013. Minimum error semantic similarity using text corpora constructed from web queries. IEEE Transactions on Knowledge and Data Engineering (submitted).Google Scholar
Jiang, J., and Conrath, D. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference on Research on Computational Linguistics, Taiwan, pp. 19–33.Google Scholar
Leacock, C., and Chodorow, M. 1998. Combining local context and WordNet similarity for word sense identification in WordNet. In Fellbaum, C., (ed.), An Electronic Lexical Database, pp. 265–83. Cambridge, MA: MIT Press.Google Scholar
Lemaire, B., and Denhière, G., 2004. Incremental construction of an associative network from a corpus. In Proceedings of the 26th Annual Meeting of the Cognitive Science Society, Chicago, IL, pp. 825–30.Google Scholar
Malandrakis, N., Iosif, E., and Potamianos, A., 2012. DeepPurple: estimating sentence semantic similarity using n-gram regression models and web snippets. In Proceedings of the First Joint Conference on Lexical and Computational Semantics, Montreal, Canada, pp. 565–70.Google Scholar
Malandrakis, N., Potamianos, A., Iosif, E., and Narayanan, S., 2011. Kernel models for affective lexicon creation. In Proceedings INTERSPEECH, Florence, Italy, August 2011, pp. 2977–80.CrossRefGoogle Scholar
Meng, H., and Siu, K.-C., 2002. Semi-automatic acquisition of semantic structures for understanding domain-specific natural language queries. IEEE Transactions on Knowledge and Data Engineering 14 (1): 172–81.CrossRefGoogle Scholar
Mihalcea, R., and Radev, D., 2011. Graph-Based Natural Language Processing and Information Retrieval. Cambridge, UK: Cambridge University Press.Google Scholar
Miller, G., 1990. Wordnet: an on-line lexical database. International Journal of Lexicography 3 (4): 235312.Google Scholar
Miller, G., and Charles, W., 1998. Contextual correlates of semantic similarity. Language and Cognitive Processes 6 (1): 128.Google Scholar
Navigli, R., 2009. Word sense disambiguation: a survey. ACM Computing Surveys 41 (2): 169.Google Scholar
Navigli, R., and Crisafulli, G., 2010. Inducing word senses to improve web search result clustering. In Proceedings of Conference on Empirical Methods in Natural Language Processing, Cambridge, MA, pp. 116–26.Google Scholar
Padó, S., and Lapata, M., 2007. Dependency-based construction of semantic space models. Computational Linguistics 33 (2): 161–99.Google Scholar
Patwardhan, S., and Pedersen, T., 2006. Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In Proceedings of the EACL Workshop on Making Sense of Sense: Bringing Computational Linguistics and Psycholinguistics Together, Trento, Italy, pp. 18.Google Scholar
Pedersen, T., 2010. Information content measures of semantic similarity perform better without sense-tagged text. In Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Los Angeles, CA, pp. 329–32.Google Scholar
Pedersen, T., and Michelizzi, S. P. J., 2004. Wordnet::similarity – measuring the relatedness of concepts. In Proceedings of the 19th National Conference on Artificial Intelligence, San Jose, CA, pp. 1024–5.Google Scholar
Radev, D., and Mihalcea, R., 2008. Networks and natural language processing. AI Magazine 29 (3): 116–26.Google Scholar
Reddy, S., Klapaftis, I., McCarthy, D., and Manandhar, S., 2011. Dynamic and static prototype vectors for semantic composition. In Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, pp. 705–13.Google Scholar
Resnik, P., 1995. Using information content to evaluate semantic similarity in a taxanomy. In Proceedings of International Joint Conference for Artificial Intelligence, Montreal, Canada, pp. 448–53.Google Scholar
Rubenstein, H., and Goodenough, J. B., 1965. Contextual correlates of synonymy. Communications of ACM 8 (10): 627–33.Google Scholar
Sebastiani, F., and Ricerche, C. N. D., 2002. Machine learning in automated text categorization. ACM Computing Surveys 34 (1): 147.CrossRefGoogle Scholar
Spanakis, G., Siolas, G., and Stafylopatis, A., 2009. A hybrid web-based measure for computing semantic relatedness between words. In Proceedings of the 21st International Conference on Tools with Artificial Intelligence, Newark, NJ, pp. 441–8.Google Scholar
Strube, M., and Ponzetto, S. P., 2006. Wikirelate! computing semantic relatedness using wikipedia. In Proceedings of 21st National Conference on Artificial Intelligence, Boston, MA, pp. 1419–24.Google Scholar
Turney, P. D., 2001. Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the European Conference on Machine Learning, Freiburg, Germany, pp. 491502.Google Scholar
Turney, P., 2006. Similarity of semantic relations. Computational Linguistics 32 (3): 379416.Google Scholar
Turney, P., and Littman, M. L. 2002. Unsupervised learning of semantic orientation from a hundred-billion-word corpus. Technical Report ERC-1094 (NRC 44929), National Research Council of Canada, Ottawa, Ontario, Canada.Google Scholar
Véronis, J., 2004. Hyperlex: lexical cartography for information retrieval. Computer Speech and Language 18 (3): 223–52.Google Scholar
Vitanyi, P., 2005. Universal similarity. In Proceedings of Information Theory Workshop on Coding and Complexity, Rotorua, New Zealand, pp. 238–43.Google Scholar
Widdows, D., and Dorow, B., 2002. A graph model for unsupervised lexical acquisition. In Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan, pp. 1093–99.Google Scholar
Wojtinnek, P.-R., Pulman, S., and Völker, J., 2012. Building semantic networks from plain text and wikipedia with application to semantic relatedness and noun compound paraphrasing. International Journal of Semantic Computing (IJSC). Special Issue on Semantic Knowledge Representation. 6 (1): 6791.Google Scholar
Wu, Z., and Palmer, M., 1994. Verbs semantics and lexical selection. In Proceedings of the Annual Meeting on Association for Computational Linguistics, Las Cruces, NM, pp. 133138.Google Scholar
Zipf, G. K., 1965. The Psycho-Biology of Language. Cambridge, MA: MIT Press.Google Scholar