Hostname: page-component-8448b6f56d-m8qmq Total loading time: 0 Render date: 2024-04-19T19:55:44.627Z Has data issue: false hasContentIssue false

Document ranking refinement using a Markov random field model*

Published online by Cambridge University Press:  14 March 2012

ESAÚ VILLATORO
Affiliation:
Department of Computer Science, National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro 1 Tonantzintla, Puebla, CP 72840, México e-mail: villatoroe@inaoep.mx, antjug@inaoep.mx, mmontesg@inaoep.mx, villasen@inaoep.mx, esucar@inaoep.mx
ANTONIO JUÁREZ
Affiliation:
Department of Computer Science, National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro 1 Tonantzintla, Puebla, CP 72840, México e-mail: villatoroe@inaoep.mx, antjug@inaoep.mx, mmontesg@inaoep.mx, villasen@inaoep.mx, esucar@inaoep.mx
MANUEL MONTES
Affiliation:
Department of Computer Science, National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro 1 Tonantzintla, Puebla, CP 72840, México e-mail: villatoroe@inaoep.mx, antjug@inaoep.mx, mmontesg@inaoep.mx, villasen@inaoep.mx, esucar@inaoep.mx
LUIS VILLASEÑOR
Affiliation:
Department of Computer Science, National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro 1 Tonantzintla, Puebla, CP 72840, México e-mail: villatoroe@inaoep.mx, antjug@inaoep.mx, mmontesg@inaoep.mx, villasen@inaoep.mx, esucar@inaoep.mx
L. ENRIQUE SUCAR
Affiliation:
Department of Computer Science, National Institute of Astrophysics, Optics and Electronics, Luis Enrique Erro 1 Tonantzintla, Puebla, CP 72840, México e-mail: villatoroe@inaoep.mx, antjug@inaoep.mx, mmontesg@inaoep.mx, villasen@inaoep.mx, esucar@inaoep.mx

Abstract

This paper introduces a novel ranking refinement approach based on relevance feedback for the task of document retrieval. We focus on the problem of ranking refinement since recent evaluation results from Information Retrieval (IR) systems indicate that current methods are effective retrieving most of the relevant documents for different sets of queries, but they have severe difficulties to generate a pertinent ranking of them. Motivated by these results, we propose a novel method to re-rank the list of documents returned by an IR system. The proposed method is based on a Markov Random Field (MRF) model that classifies the retrieved documents as relevant or irrelevant. The proposed MRF combines: (i) information provided by the base IR system, (ii) similarities among documents in the retrieved list, and (iii) relevance feedback information. Thus, the problem of ranking refinement is reduced to that of minimising an energy function that represents a trade-off between document relevance and inter-document similarity. Experiments were conducted using resources from four different tasks of the Cross Language Evaluation Forum (CLEF) forum as well as from one task of the Text Retrieval Conference (TREC) forum. The obtained results show the feasibility of the method for re-ranking documents in IR and also depict an improvement in mean average precision compared to a state of the art retrieval machine.

Type
Articles
Copyright
Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Baeza-Yates, R., and Ribeiro-Neto, B. 1999. Modern Information Retrival. Addison Wesley, Wokingham, UK.Google Scholar
Balinski, J., and Danilowicz, C. 2005. Re-ranking methos based on inter-document distance. Information Processing and Management 41: 759–75.CrossRefGoogle Scholar
Bear, J., Israel, D., Petit, J., and Martin, D. 1997. Using information extraction to improve document retrieval. In Proceedings of the 6th Text Retrieval Conference.Google Scholar
Bendersky, M., and Kurland, O. 2008. Re-ranking search results using document-passage graphs. In Proceedings of the 31st annual international ACM SIGIR Conference on Research and Development in information Retrieval (SIGIR'08), pp. 853–4, ACM Press. Singapore, Singapore.Google Scholar
Besag, J. 1986. On the statistical analysis of dirty pictures (with discussion). Journal of the Royal Statistical Society, Series B 48: 259302.Google Scholar
Carbonetto, P., De Freitas, N., and Barnard, K. 2004. A statistical model for general context object recognition. In Proceedings of the 8th European Conference on Computer Vision, vol. 3021, pp. 350–62. Lecture Notes in Computer Science. Berlin: Springer-Verlag.Google Scholar
Chávez, O., Sucar, L. E., and Montes, M. 2010. Image re-ranking based on relevance feedback combining internal and external similarities. In Proceedings of The FLAIRS Conference, Daytona Beach, Florida, USA.Google Scholar
Crouch, C., Crouch, D., Chen, Q., and Holtz, S. 2002. Improving the Retrieval Effectiveness of Very Short Queries. Information Processing and Management 38.CrossRefGoogle Scholar
Deng, H., Lyu, M. R., and King, I. 2009. Effective latent space graph-based re-ranking model with global consistency. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining (WSDM'09), pp. 212–21, ACM Press. Barcelona, Spain.CrossRefGoogle Scholar
Di Nunzio, G. M., Ferro, N., Mandl, T., and Peters, C. 2008. CLEF 2007: Ad Hoc Track overview. In Post-proceedings of the 8th Workshop of the Cross Language Evaluation Forum CLEF 2007, vol. 5152, pp. 1332. Lecture Notes in Computer Science. Berlin: Springer-Verlag.Google Scholar
Diaz, F. 2005. Regularising Ad Hoc retrieval scores. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM'05), pp. 672–79, ACM Press. Bremen, Germany.Google Scholar
Escalante, H. J., Montes, M., and Sucar, L. E. 2007. Word Co-occurrence and Markov random fields for improving automatic image annotation. In Proceedings of the 18th British Machine Vision Conference, vol. 2, pp. 600–9. Warwick, UK.Google Scholar
Garafolo, J. S., Auzanne, C. G. P., and Voorhees, E. M. 2000. The TREC spoken document retrieval track: a success story. In Proceedings of the RIAO 2000 Conference: Content-Based Multimedia Information Access, pp. 120, Paris.Google Scholar
Geman, S., and Geman, D. 1984. Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. In IEEE Transactions on: Pattern Analysis and Machine Intelligence, vol. 6, pp. 721–41.Google Scholar
Grossman, D. A., and Frieder, O. 2004. Information Retrieval, Algorithms and Heuristics, 2nd ed. Springer. Dordrecht, The Netherlands.CrossRefGoogle Scholar
Grubinger, M. 2007. Analysis and Evaluation of Visual Information Systems Performance. PhD thesis. School of Computer Science and Mathematics, Faculty of Health, Engineering and Science, Victoria University. Melbourne, Australia.Google Scholar
Held, K., Kops, E., Krause, B., Wells, W. III, Kikinis, R., and Müeller, H. 1997. Markov random field segmentation of brain MR images. IEEE Transactions on Medical Imaging 16: 878.CrossRefGoogle ScholarPubMed
Hernández, C., and Sucar, L. E. 2007. Markov random fields and spatial information to improve automatic image annotation. In Proceedings of the 2007 Pacific-Rim Symposium on Image and Video Technology, vol. 4872, pp. 879–92. Lecture Notes in Computer Science. Berlin: Springer-Verlag.Google Scholar
Kamps, J. 2004. Improving retrieval effectiveness by reranking documents based on controlled vocabulary. In Proceedings of the 21th European Conference on Information Retrieval, vol. 2997, pp. 283–95. Lecture Notes in Computer Science. Springer.Google Scholar
Kemeny, J., Snell, J. L., and Kanpp, A. W. 1976. Denumerable Markov Chains. New York/Heidelberg/Berlin: Springer Verlag.CrossRefGoogle Scholar
Kurland, O., and Lee, L. 2005. PageRank without hyper-links: structural re-ranking using links induced by language models. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05), pp. 306–13. ACM Press. Salvador, Brazil.Google Scholar
Lauritzen, S. L. 1996. Graphical Models. New York, NY: Oxford University Press.CrossRefGoogle Scholar
Lease, M. 2009. An improved Markov random field model for supporting verbose queries. In Proceedings of the 32th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09), pp. 476–83. ACM Press. Boston, MA, USA.Google Scholar
Lee, K., Park, Y., and Choi, K. S. 2001. Document re-ranking model using clusters. Information Processing and Management 37 (1): 114.Google Scholar
Li, S. Z. 1994. Markov random field models in computer vision. In Proceedings of the European Conference on Computer Vision, vol. 18, pp. 361–70. Lecture Notes in Computer Science. Berlin: Springer-Verlag.Google Scholar
Li, S. Z. 2001. Markov Random Field Modeling in Image Analysis, 2nd. ed. Springer.Google Scholar
Luk, R. W. P., and Wong, K. F. 2004. Pseudo-relevance feedback and title re-ranking for Chinese IR. In Proceedings of the 4th NTCIR Workshop meeting, Cross-lingual Information Retrierval Task. Tokyo, Japan.Google Scholar
Mallows, C. 1975. Non-null ranking models. Biomedika 44: 114–30.Google Scholar
Metzler, D., and Croft, B. 2005. A Markov random field model for term dependencies. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'09), pp. 472–9. ACM Press. Salvador, Brazil.Google Scholar
Metzler, D., and Croft, B. 2007. Latent concept expansion using Markov random fields. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07), pp. 311–8. ACM Press. Amsterdam, The Netherlands.Google Scholar
Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08), pp. 206–14. ACM Press. Melbourne, Australia.Google Scholar
Pearl, J. 1988. Probabilistic Reasoning in Intelligent Systems. San Mateo CA: Morgan Kaufman.Google Scholar
Porter, M. F. 1997. An Algorithm for Suffix Stripping, pp. 313–6. Morgan Kaufman Publishers Inc. San Francisco, CA, USA.Google Scholar
Qu, Y., Xu, G., and Wang, J. 2000. Rerank method based on individual thesaurus. In Proceedings of the 2nd NTCIR Workshop on reserach in Chinese and Japanese Text Retrieval and Text Summarization. Tokyo, Japan.Google Scholar
Salton, G., and Buckley, C. 1990. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science 41 (4): 288–97.3.0.CO;2-H>CrossRefGoogle Scholar
Salton, G., Yang, C. S., and Wong, A. 1975. A vector space model for automatic indexing. Communications of the ACM 18 (11): 613–20.CrossRefGoogle Scholar
Sarkar, P., and Moore, A. W. 2009. Fast dynamic reranking in large graphs. In Proceedings of the 18th International conference on World Wide Web (WWW'09), pp. 3140, ACM Press. Madrid, Spain.Google Scholar
Smucker, M. D., Allan, J., and Carterette, B. 2007. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM'07), pp. 623–32, ACM Press. Lisbon, Portugal.Google Scholar
Villatoro-Tello, E., Montes-y-Gómez, M., and Villaseñor-Pineda, L. 2009a. A ranking approach based on example texts for geographic information retrieval. In Post-Proceedings of the 9th Workshop of the Cross Language Evaluation Forum CLEF 2008, vol. 5822, pp. 239–50. Lecture Notes in Computer Science. Berlin: Springer-Verlag.Google Scholar
Villatoro-Tello, E., Villaseñor-Pineda, L., and Montes-y-Gómez, M. 2009b. Ranking refinement via relevance feedback in geographic information retrieval. In Proceeding of the Mexican International Conference on Artificial Intelligence MICAI 2009, vol. 5845, pp. 165–76. Lecture Notes in Computer Science. Berlin: Springer-Verlag.Google Scholar
Winkler, G. 2006. Image analysis, random fields and Markov chain monte carlo methods. Springer Series on Applications of Mathematics, Rozovskii, B. and Yor, M. eds. Vol. 27, pp. 179–96, 2nd ed. Springer, Germany.Google Scholar
Xu, J., and Croft, W. B. 1996. Query expansion using local and global document analysis. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 411. ACM Press. Zurich, Switzerland.Google Scholar
Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems 18 (1): 79112.CrossRefGoogle Scholar
Yang, L. P., and Ji, D. H. 2005a. Chinese information retrieval based on terms and relevant terms. ACM Transactions on Asian Language Information Processing 4 (3): 357–74.Google Scholar
Yang, L. P., and Ji, D. H. 2005b. Chinese document re-ranking based on term distribution and maximal marginal relevance. In Proceedings of the 2nd Asian Information Retrieval Symposium AIRS, vol. 3689, pp. 299311. Lecture Notes in Computer Science. Berlin: Springer-Verlag.Google Scholar
Yang, L., Ji, D., and Tang, L. 2004. Document re-ranking based on automatically acquired key terms in Chinese information retrieval. In COLING '04 Proceedings of the 20th International Conference on Computational Linguistics. pp. 480–6. Association for Computational Linguistics. Geneva, Switzerland.Google Scholar
Yang, L., Ji, D., Zhou, G., Nie, Y., and Xiao, G. 2006. Document re-ranking using cluster validation and label propagation. In Proceedings of the ACM CIKM 2006 International Conference on Information and Knowledge Management, pp. 690–7. ACM Press. Arlington, Virginia, USA.Google Scholar
Zhang, B., Hua, L., Yi, L., Lei, J., Wensi, X., Weiguo, F., Zheng, C., and Wei-Ying, M. 2005. Improving web search results using affinity graph. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 504–11. ACM Press. Salvador, Brazil.CrossRefGoogle Scholar
Zhou, D., Lawless, S., Min, J., and Wade, V. 2010a. A late fusion approach to cross-lingual document re-ranking. In Proceedings of the ACM CIKM 2010 International Conference on Information and Knowledge Management, pp. 1433–6. ACM Press. Toronto, ON, Canada.Google Scholar
Zhou, D., Lawless, S., Min, J., and Wade, V. 2010b. Dual-space re-ranking model for document retrieval. In COLING '10 Proceedings of the 23rd international conference on Computational Linguistics, pp. 1524–32. Association for Computational Linguistics. Beijing, China.Google Scholar