Experiments with three approaches to recognizing lexical entailment

P. D. TURNEY; S. M. MOHAMMAD

doi:10.1017/S1351324913000387

Experiments with three approaches to recognizing lexical entailment

Published online by Cambridge University Press: 28 January 2014

P. D. TURNEY and

S. M. MOHAMMAD

Show author details

P. D. TURNEY: Affiliation:
National Research Council Canada, Ottawa, Ontario K1A 0R6, Canada e-mail: peter.turney@nrc-cnrc.gc.ca, saif.mohammad@nrc-cnrc.gc.ca
S. M. MOHAMMAD: Affiliation:
National Research Council Canada, Ottawa, Ontario K1A 0R6, Canada e-mail: peter.turney@nrc-cnrc.gc.ca, saif.mohammad@nrc-cnrc.gc.ca

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Inference in natural language often involves recognizing lexical entailment (RLE), that is, identifying whether one word entails another. For example, buy entails own. Two general strategies for RLE have been proposed: One strategy is to manually construct an asymmetric similarity measure for context vectors (directional similarity) and another is to treat RLE as a problem of learning to recognize semantic relations using supervised machine-learning techniques (relation classification). In this paper, we experiment with two recent state-of-the-art representatives of the two general strategies. The first approach is an asymmetric similarity measure (an instance of the directional similarity strategy), designed to capture the degree to which the contexts of a word, a, form a subset of the contexts of another word, b. The second approach (an instance of the relation classification strategy) represents a word pair, a: b, with a feature vector that is the concatenation of the context vectors of a and b, and then applies supervised learning to a training set of labeled feature vectors. In addition, we introduce a third approach that is a new instance of the relation classification strategy. The third approach represents a word pair, a: b, with a feature vector in which the features are the differences in the similarities of a and b to a set of reference words. All three approaches use vector space models of semantics, based on word–context matrices. We perform an extensive evaluation of the three approaches using three different datasets. The proposed new approach (similarity differences) performs significantly better than the other two approaches on some datasets and there is no dataset for which it is significantly worse. Along the way, we address some of the concerns raised in past research, regarding the treatment of RLE as a problem of semantic relation classification, and we suggest, it is beneficial to make connections between the research in lexical entailment and the research in semantic relation classification.

Type: Articles
Information: Natural Language Engineering , Volume 21 , Issue 3 , May 2015 , pp. 437 - 476

DOI: https://doi.org/10.1017/S1351324913000387 [Opens in a new window]
Copyright: Copyright © Her Majesty the Queen in Right of Canada, as represented by the National Research Council Canada 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agresti, A., 1996. An Introduction to Categorical Data Analysis. New York, NY: Wiley.Google Scholar

Akhmatova, E., and Dras, M., 2009. Using hypernymy acquisition to tackle (part of) textual entailment. In Proceedings of the 2009 Workshop on Applied Textual Inference at ACL-IJCNLP 2009, Suntec, Singapore, pp. 52–60.Google Scholar

Androutsopoulos, I., and Malakasiotis, P., 2010. A survey of paraphrasing and textual entailment methods. Journal of Artificial Intelligence Research 38: 135–87.CrossRef Google Scholar

Baroni, M., Bernardi, R., Do, N.-Q., and Shan, C., 2012. Entailment above the word level in distributional semantics. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), Avignon, France, pp. 23–32.Google Scholar

Bejar, I. I., Chaffin, R., and Embretson, S. E., 1991. Cognitive and Psychometric Analysis of Analogical Problem Solving. New York, NY: Springer-Verlag.CrossRef Google Scholar

Buckley, C., and Voorhees, E. 2000. Evaluating evaluation measure stability. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 33–40. New York, NY: ACM.CrossRef Google Scholar

Bullinaria, J., and Levy, J., 2007. Extracting semantic representations from word co-occurrence statistics: a computational study. Behavior Research Methods 39 (3): 510–26.CrossRef Google Scholar PubMed

Bullinaria, J., and Levy, J. 2012. Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD. Behavior Research Methods 44, 890–907.CrossRef Google Scholar PubMed

Büttcher, S., and Clarke, C. 2005. Efficiency vs. effectiveness in terabyte-scale information retrieval. In Proceedings of the 14th Text REtrieval Conference (TREC 2005), Gaithersburg, MD.Google Scholar

Caron, J., 2001. Experiments with LSA scoring: optimal rank and basis. In Proceedings of the SIAM Computational Information Retrieval Workshop, Raleigh, NC, pp. 157–69.Google Scholar

Dagan, I., Dolan, B., Magnini, B., and Roth, D. 2009. Recognizing textual entailment: rational, evaluation and approaches. Natural Language Engineering 15 (4): i–xvii.CrossRef Google Scholar

Dagan, I., Glickman, O., and Magnini, B. 2006. The PASCAL recognising textual entailment challenge. In Quiñonero-Candela, J., Dagan, I., Magnini, B., and d’Alché-Buc, F. (eds.), Machine Learning Challenges: Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment, pp. 177–90, New York, NY: Springer.CrossRef Google Scholar

Do, Q. X., and Roth, D., 2010. Constraints-based taxonomic relation classification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP 2010), MIT, Cambridge, MA, pp. 1099–109.Google Scholar

Do, Q. X., and Roth, D., 2012. Exploiting the Wikipedia structure in local and global classification of taxonomic relations. Natural Language Engineering 18 (2): 235–62.Google Scholar

Firth, J. R. 1957. A synopsis of linguistic theory 1930–1955. In Palmer, F. (ed.), Studies in Linguistic Analysis, pp. 1–32. Oxford, UK: Blackwell.Google Scholar

Geffet, M., and Dagan, I., 2005. The distributional inclusion hypotheses and lexical entailment. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL 2005), Ann Arbor, MI, pp. 107–14.Google Scholar

Girju, R., Nakov, P., Nastase, V., Szpakowicz, S., Turney, P., and Yuret, D., 2007. SemEval-2007 Task 4: classification of semantic relations between nominals. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval 2007), Prague, Czech Republic, pp. 13–8.CrossRef Google Scholar

Glickman, O., Dagan, I., and Shnarch, E., 2006. Lexical reference: a semantic matching subtask. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), Sydney, Australia, pp. 172–9.CrossRef Google Scholar

Golub, G. H., and VanAAAALoan, C. F. 1996. Matrix Computations, 3rd edn.Baltimore, MD: Johns Hopkins University Press.Google Scholar

Harris, Z., 1954. Distributional structure. Word 10 (23): 146–62.CrossRef Google Scholar

Hearst, M., 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th Conference on Computational Linguistics (COLING-92), Nantes, France, pp. 539–45.CrossRef Google Scholar

Hendrickx, I., Kim, S. N., Kozareva, Z., Nakov, P., Séaghdha, D. O., Padó, S., Pennacchiotti, M., Romano, L., and Szpakowicz, S., 2010. Semeval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden, pp. 33–8.Google Scholar

Herrera, J., Peñas, A., and Verdejo, F. 2006. Textual entailment recognition based on dependency analysis and WordNet. In Machine Learning Challenges: Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment, Lecture Notes in Computer Science, vol. 3944, pp. 231–9. New York, NY: Springer.CrossRef Google Scholar

Hickl, A., Bensley, J., Williams, J., Roberts, K., Rink, B., and Shi, Y. 2006. Recognizing textual entailment with LCC’s GROUNDHOG system. In Proceedings of the Second PASCAL Challenges Workshop on Recognizing Textual Entailment, Venice, Italy.Google Scholar

Hunter, G., 1996. Metalogic: An Introduction to the Metatheory of Standard First Order Logic. Berkeley, CA: University of California Press.Google Scholar

Jurgens, D. A., Mohammad, S. M., Turney, P. D., and Holyoak, K. J., 2012. SemEval-2012 Task 2: measuring degrees of relational similarity. In Proceedings of the First Joint Conference on Lexical and Computational Semantics (*SEM), Montréal, Canada, pp. 356–64.Google Scholar

Kotlerman, L., Dagan, I., Szpektor, I., and Zhitomirsky-Geffet, M., 2010. Directional distributional similarity for lexical inference. Natural Language Engineering 16 (4): 359–89.CrossRef Google Scholar

Landauer, T. K., McNamara, D. S., Dennis, S., and Kintsch, W., 2007. Handbook of Latent Semantic Analysis. Mahwah, NJ: Lawrence Erlbaum.CrossRef Google Scholar

Lee, L., 1999. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, College Park, MD, pp. 25–32.Google Scholar

Lin, D. 1998. Automatic retrieval and clustering of similar words. In Proceedings of the 17th International Conference on Computational Linguistics, Montreal, Quebec, Canada, pp. 768–74. Ann Arbor, MI: Association for Computational Linguistics.Google Scholar

Lin, D., and Pantel, P., 2001. DIRT – discovery of inference rules from text. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2001, San Francisco, CA, pp. 323–8.Google Scholar

Manning, C., and Schütze, H., 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.Google Scholar

Mirkin, S., Bar-Haim, R., Berant, J., Dagan, I., Shnarch, E., Stern, A., and Szpektor, I. 2009a. Bar-ilan University’s submission to RTE-5. In Proceedings of the Second Text Analysis Conference (TAC 2009), Gaithersburg, MD.Google Scholar

Mirkin, S., Dagan, I., and Shnarch, E., 2009b. Evaluating the inferential utility of lexical-semantic resources. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), Athens, Greece, pp. 558–66.Google Scholar

Morris, J., and Hirst, G. 2004. Non-classical lexical semantic relations. In Workshop on Computational Lexical Semantics, HLT-NAACL-04, Boston, MA.Google Scholar

Nastase, V., and Szpakowicz, S., 2003. Exploring noun-modifier semantic relations. In Proceedings of the Fifth International Workshop on Computational Semantics (IWCS-5), Tilburg, Netherlands, pp. 285–301.Google Scholar

Ogden, C. K., 1930. Basic English: A General Introduction with Rules and Grammar. London: Kegan Paul, Trench, Trubner.Google Scholar

Pan, S. J., and Yang, Q., 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22: 1345–59.CrossRef Google Scholar

Pedersen, T., Patwardhan, S., and Michelizzi, J. 2004. WordNet::Similarity – measuring the relatedness of concepts. In Palmer, D., Polifroni, J., and Roy, D. (eds.), Demonstration Papers at HLT-NAACL 2004, Boston, MA, pp. 38–41.CrossRef Google Scholar

Platt, J. C. 1998. Fast training of support vector machines using sequential minimal optimization. In Schölkopf, B., Burges, C. J. C., and Smola, A. J. (eds.), Advances in Kernel Methods: Support Vector Learning, pp. 185–208, Cambridge, MA: MIT Press.Google Scholar

Rosario, B., and Hearst, M., 2001. Classifying the semantic relations in noun-compounds via a domain-specific lexical hierarchy. In Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (EMNLP-01), Pittsburgh, PA, pp. 82–90.Google Scholar

Rosario, B., Hearst, M., and Fillmore, C., 2002. The descent of hierarchy, and selection in relational semantics. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-02), Philadelphia, PA, pp. 247–54.Google Scholar

Salton, G., and McGill, M., 1983. Introduction to Modern Information Retrieval. New York, NY: McGraw-Hill.Google Scholar

Shnarch, E., Barak, L., and Dagan, I., 2009. Extracting lexical reference rules from Wikipedia. In Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, Suntec, Singapore, pp. 450–8.Google Scholar

Snow, R., Jurafsky, D., and Ng, A. Y., 2006. Semantic taxonomy induction from heterogenous evidence. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, Sydney, NSW, Australia, pp. 801–8.Google Scholar

Szpektor, I., and Dagan, I., 2008. Learning entailment rules for unary templates. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008), Manchester, UK, pp. 849–56.CrossRef Google Scholar

Turney, P. D., 2006. Similarity of semantic relations. Computational Linguistics 32 (3): 379–416.CrossRef Google Scholar

Turney, P. D., 2012. Domain and function: a dual-space model of semantic relations and compositions. Journal of Artificial Intelligence Research 44: 533–85.CrossRef Google Scholar

Turney, P. D., Neuman, Y., Assaf, D., and Cohen, Y., 2011. Literal and metaphorical sense identification through concrete and abstract context. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK, pp. 680–90.Google Scholar

Turney, P. D., and Pantel, P., 2010. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research 37: 141–88.CrossRef Google Scholar

Weeds, J., and Weir, D., 2003. A general framework for distributional similarity. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2003), Sapporo, Japan, pp. 81–8.Google Scholar

Weeds, J., Weir, D., and McCarthy, D., 2004. Characterising measures of lexical distributional similarity. In Proceedings of the 20th International Conference on Computational Linguistics (COLING '04), Geneva, Switzerland, pp. 1015–21.CrossRef Google Scholar

Witten, I. H., Frank, E., and Hall, M. A. 2011. Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. San Francisco, CA: Morgan Kaufmann.Google Scholar

Zhitomirsky-Geffet, M., and Dagan, I., 2009. Bootstrapping distributional feature vector quality. Computational Linguistics 35 (3): 435–61.CrossRef Google Scholar

Article contents

Experiments with three approaches to recognizing lexical entailment

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests