Hostname: page-component-8448b6f56d-qsmjn Total loading time: 0 Render date: 2024-04-24T23:09:47.079Z Has data issue: false hasContentIssue false

Survey about citation context analysis: Tasks, techniques, and resources

Published online by Cambridge University Press:  05 November 2015

MYRIAM HERNÁNDEZ-ALVAREZ
Affiliation:
Escuela Politécnica Nacional, Facultad de Ingeniería de Sistemas, Quito, Ecuador e-mail: myriam.hernandez@epn.edu.ec
JOSÉ M. GOMEZ
Affiliation:
Dpto. de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Alicante, España e-mail: jmgomez@ua.es

Abstract

Bibliometric calculations currently used to assess the quality of researchers, articles, and scientific journals have serious structural problems; many authors have noted the weakness of citation counts, because they are purely quantitative and do not differentiate between high- and low-citing papers. If a paper’s reputation is simply evaluated according to the number of its citations, then incomplete, incorrect, or controversial articles may be promoted, regardless of their relevancy. Therefore, perverse incentives are generated for researchers who may publish many incorrect or incomplete papers to achieve high impact indexes. It is essential to improve the objective criteria for automatic article-quality assessments. However, to obtain these new criteria, it is necessary to advance the programmed detection of context, polarity, and function of bibliographic references.

We present an overview of general concepts and review contributions to the solutions to problems related to these issues, with the purpose of identifying trends and suggesting possible future research directions.

Type
Survey Paper
Copyright
Copyright © Cambridge University Press 2015 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abu-Jbara, A., and Radev, D., 2012. Reference scope identification in citing sentences. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, ACL. Stroudsburg, PA, pp. 8090.Google Scholar
Abu-Jbara, A., Ezra, J., and Radev, D., 2013. Purpose and polarity of citation: Towards NLP-based bibliometrics. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, ACL. Atlanta, GA, pp. 596606.Google Scholar
Angrosh, M. A., Cranefield, S., and Stanger, N. 2013. Conditional random field based sentence context identification: Enhancing citation services for the research community. In Proceedings of the First Australasian Web Conference, Adelaide, Australia, Australian Computer Society, Inc., vol. 144: pp. 5968.Google Scholar
Artstein, R., and Poesio, M., 2008. Inter-coder agreement for computational linguistics. Computational Linguistics 34 (4): 555–96.Google Scholar
Athar, A., 2011. Sentiment analysis of citations using sentence structure-based features. In Proceedings of the ACL 2011 Student Session, ACL. Stroudsburg, PA, pp. 81–7.Google Scholar
Athar, A. 2014. Sentiment analysis of scientific citations. Technical Report, University of Cambridge, Computer Laboratory, (UCAM-CL-TR-856).Google Scholar
Athar, A., and Teufel, S., 2012. Context-enhanced citation sentiment detection. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, ACL. Montreal, Canada, pp. 597601.Google Scholar
Biber, D., and Finegan, E. 1994. Intra-textual variation within medical research articles. In Oostdijiik, N. and DeHaan, P. (eds.), Corpus-Based Research into Language, pp. 201–22. Amsterdam: Rodopi.Google Scholar
Bird, S., Dale, R., Dorr, B. J., Gibson, B., Joseph, M. T., Kan, M. Y., Lee, D., Powley, B., Radev, D. R., and Tan, Y. F., 2008. The ACL anthology reference corpus: A reference dataset for bibliographic research in computational linguistics. In Proceedings of the 6th International Conference on Language Resources and Evaluation Conference (LREC’08), Marrakesh, Morocco, pp. 1755–59.Google Scholar
Blitzer, J., Dredze, M., and Pereira, F., 2007. Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 440–47.Google Scholar
Boldrini, E., Fernández Martínez, J., Gómez Soriano, J. M., and Martínez Barco, P., 2009. Machine learning techniques for automatic opinion detection in non-traditional textual genres. In Proceedings of the First Workshop on Opinion Mining and Sentiment Analysis, WOMSA09, Seville, Spain, pp. 110–19.Google Scholar
Brembs, B., and Munafò, M. 2013. Deep impact: Unintended consequences of journal rank. Digital Libraries; Physics and Society. Available at http://arxiv.org/abs/1301.3748Google Scholar
Chen, M., Xu, Z., Weinberger, K., and Sha, F., 2012. Marginalized denoising autoencoders for domain adaptation. In Proceedings of the 29th International Conference on Machine Learning, Edinburgh, Scotland, pp. 767–84.Google Scholar
Ciancarini, P., Di Iorio, A., Nuzzolese, A. G., Peroni, S., and Vitali, F. 2014. Evaluating citation functions in CiTO: Cognitive issues. Semantic Web: Trends and Challenges, pp. 580–94. Berlin: Springer International Publishing.Google Scholar
Ciancarini, P., Iorio, A.Di Nuzzolese, A. G., Peroni, S., and Vitali, F. 2013. Semantic annotation of scholarly documents and citations. AI*IA 2013: Advances in Artificial Intelligence, vol. 8249: pp. 336–47. Berlin: Springer.Google Scholar
Davletov, F., Aydin, A. S., and Cakmak, A., 2014. High impact academic paper prediction using temporal and topological features. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, Shanghai, China, pp. 491–98.Google Scholar
Dong, C., and Schäfer, U., 2011. Ensemble-style self-training on citation classification. In Proceedings of 5th International Joint Conference on Natural Language Processing, Asian Federation of Natural Language Processing, Chiang Mai, Thailand, pp. 623–31.Google Scholar
Fang, F. C., Steen, R. G., and Casadevall, A., 2012. Misconduct accounts for the majority of retracted scientific publications. In Proceedings of the National Academy of Sciences of the United States of America, United States of America, vol. 109, pp. 17028–33.Google Scholar
Fernández, J., Boldrini, E., Gómez, J. M., and Martínez-Barco, P. 2011. Evaluating EmotiBlog robustness for sentiment analysis tasks. In Natural Language Processing and Information Systems, Heidelberg: Springer-Verlag, pp. 290–94.Google Scholar
Garfield, E., 1972. Citation analysis as a tool in journal evaluation: Journals can be ranked by frequency and impact of citations for science policy studies. Science 178: 471–79.Google Scholar
Garzone, M. A. 1997. Automated classification of citations using linguistic semantic grammars. Master’s Thesis. The University of Western Ontario. Available at http://www.collectionscanada.gc.ca/obj/s4/f2/dsk2/ftp04/mq28570.pdfGoogle Scholar
Garzone, M., and Mercer, R. E. 2000. Towards an automated citation classifier. In Advances in Artificial Intelligence, pp. 337–46. Berlin Heidelberg: Springer.Google Scholar
Green, A., Ashley, K., Litman, D., Reed, C., and Walker, V. 2014. In Proceedings of the First Workshop on Argumentation Mining, ACL. Baltimore, MD, p. 3.Google Scholar
He, Q., Kifer, D., Pei, J., Mitra, P., and Giles, C. L., 2011. Citation recommendation without author supervision. In Proceedings of the 4th ACM international Conference on Web Search and Data Mining, ACM. Kowloon, Hong Kong, pp. 755–64.CrossRefGoogle Scholar
Hernández, M., and Gómez, J. M., 2014. Survey in sentiment, polarity and function analysis of citation. In Proceedings of the First Workshop on Argumentation Mining, ACL. Baltimore, MD, pp. 102–3.CrossRefGoogle Scholar
Hirsch, J. E. 2005. An index to quantify an individual’s scientific research output. Proceedings of the National academy of Sciences of the United States of America, United States of America, 102 (46): 16569–72.Google Scholar
Hyland, K., 1996. Writing without conviction? Hedging in science research articles. Applied Linguistics 17: 433–54.Google Scholar
Hyland, K. 1998. Hedging in Scientific Research Articles, vol. 54. Amsterdam: John Benjamins Publishing.Google Scholar
Ioannidis, J. P. A., 2005. Why most published research findings are false. Chance 18 (4): 40–7.Google Scholar
Iorio, A., Di Nuzzolese, A. G., and Peroni, S., 2013. Towards the Automatic Identification of the Nature of Citations. Montpellier, France: SePublica, pp. 6374.Google Scholar
Jochim, C., 2014. Improving citation polarity classification with product reviews. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL. Baltimore, MD, pp. 42–8.Google Scholar
Jochim, C., and Schütze, H., 2012. Towards a generic and flexible citation classifier based on a faceted classification scheme. In Procedings of COLING’12, Mumbai, India, pp. 1343–58.Google Scholar
Kang, I.-S., and Kim, B.-K. 2012. Characteristics of citation scopes: a preliminary study to detect citing sentences. In Computer Applications for Database, Education, and Ubiquitous Computing Information Science, pp. 80–5. Berlin: Springer.Google Scholar
Kaplan, D., Iida, R., and Tokunaga, T., 2009. Automatic extraction of citation contexts for research paper summarization: a coreference-chain based approach. In Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, ACL. Suntec, Singapore, pp. 8895.Google Scholar
Kataria, S., Mitra, P., and Bhatia, S., 2010. Utilizing context in generative bayesian models for linked corpus. In AAAI Conference in Artificial Intelligence, Atlanta, Georgia, USA, pp. 1340–45.Google Scholar
Kataria, S., Mitra, P., Caragea, C., and Giles, C. L. 2011. Context sensitive topic models for author influence in document networks. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, Barcelona, Spain, vol. 22 (3): p. 2274.Google Scholar
Kessler, M. M. 1963. Bibliographic coupling between scientific papers. American documentation Wiley Periodicals, Inc. 14 (1): 1025.Google Scholar
Klein, D., and Manning, C. D. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting, ACL. Stroudsburg, PA, USA, vol. 1: pp. 423−30.Google Scholar
Li, X., He, Y., Meyers, A., and Grishman, R., 2013. Towards fine-grained citation function classification. In Proceedings of Recent Advances in Natural Language Processing, Hissar, Bulgaria, pp. 402–7.Google Scholar
Liakata, M., Saha, S., Dobnik, S., Batchelor, C., and Rebholz-Schuhmann, D., 2012. Automatic recognition of conceptualization zones in scientific articles and two life science applications. Bioinformatics 28: 9911000.CrossRefGoogle ScholarPubMed
Livne, A., Gokuladas, V., Teevan, J., Dumais, S. T., and Adar, E., 2014. CiteSight: supporting contextual citation recommendation using differential search. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, ACM. Gold Coast, Australia, pp. 807–16.Google Scholar
MacRoberts, M. H., and MacRoberts, B. R. 1984. The negational reference: Or the Art of dissembling. Social Studies of Science, London, Beverly Hills and New Delhi, 14 (1): 91–4, Sage Publications Ltd.CrossRefGoogle Scholar
Marder, E., Kettenmann, H., and Grillner, S. 2010. Impacting our young. In Proceedings of the National Academy of Sciences of the United States of America United States of America, 107: 21233.Google Scholar
Mei, Q., and Zhai, C. 2008. Generating impact-based summaries for scientific literature. In Proceedings of the 46 Annual Meeting: HLT, ACL. Columbus, Ohio, USA, vol. 8: pp. 816–24.Google Scholar
Mercer, R. E., Di Marco, C., and Kroon, F. W. 2004. The frequency of hedging cues in citation contexts in scientific writing. In Advances in Artificial Intelligence, pp. 7588. Berlin: Springer Heidelberg.Google Scholar
Meyers, A., 2013. Contrasting and corroborating citations in journal articles. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, Hissar, Bulgaria, pp. 460–66.Google Scholar
Moravcsik, M. J., & Murugesan, P. (1975). Some results on the function and quality of citations. Social studies of science, 5 (1), 8692.CrossRefGoogle Scholar
Mullen, T., and Collier, N., 2004. Sentiment analysis using support vector machines with diverse information sources. In Conference on Empirical Methods in Natural Language Processing, ACL. Barcelona, Spain, pp. 412–18.Google Scholar
Nallapati, R. M., Ahmed, A., Xing, E. P., and Cohen, W. W., 2008. Joint latent topic models for text and citations. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM. Las Vegas, Nevada, USA, pp. 542–50.Google Scholar
Nicholson, J. M., and Ioannidis, J. P. A., 2012. Research grants: Conform and be funded. Nature 492: 34–6.Google Scholar
Page, L., Brin, S., Motwani, R., and Winograd, T. 1999. The PageRank citation ranking: Bringing order to the web. Technical Report, Stanford InfoLab, Stanford University (SIDL-WP-1999–0120).Google Scholar
Pang, B., and Lee, L., 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL. Morristown, NJ, USA, pp. 271–78.Google Scholar
Peldszus, A. 2014. Towards segment-based recognition of argumentation structure in short texts. In Proceedings of the First Workshop on Argumentation Mining ACL 2014, Baltimore, MD, USA, pp. 8897.Google Scholar
Prabowo, R., and Thelwall, M., 2009. Sentiment analysis: A combined approach. Journal of Informetrics 3: 143–57.Google Scholar
Qazvinian, V., and Radev, D. R. 2008. Scientific paper summarization using citation summary networks. In Proceedings of the 22nd International Conference on Computational Linguistics, ACL. Stroudsburg, PA, vol. 1: pp. 689−96.Google Scholar
Qazvinian, V., and Radev, D. R. 2010. Identifying non-explicit citing sentences for citation-based summarization. In Proceedings of the 48th Annual Meeting, ACL. Uppsala, Sweden, pp. 555−64.Google Scholar
Radev, D. R., Muthukrishnan, P., and Qazvinian, V., 2009. The ACL Anthology Network corpus. In Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, ACL. Suntec, Singapore, pp. 5461.Google Scholar
Radicchi, F., 2012. In science “there is no bad publicity”: Papers criticized in comments have high scientific impact. Nature Scientific Reports 2: 815.Google Scholar
Reyhani Hamedani, M., Kim, S. W., Lee, S. C., and Kim, D. J., 2013. On exploiting content and citations together to compute similarity of scientific papers. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, ACM. San Francisco, CA, USA, pp. 1553–56.Google Scholar
Ritchie, A., Robertson, S., and Teufel, S., 2008. Comparing citation contexts for information retrieval. In Proceedings of the 17th Acm Conference on Information and Knowledge Management, ACM. Napa Valley, CA, USA, pp. 213–22.Google Scholar
Sample, I. 2013. Nobel winner declares boycott of top science journals. The Guardian. Available at http://www.theguardian.com/science/2013/dec/09/nobel-winner-boycott-science-journalsGoogle Scholar
Sayyadi, H., and Getoor, L., 2009. FutureRank: Ranking scientific articles by predicting their future PageRank. In SDM Siam International Conference on Data Mining, Sparks, Nevada, pp. 533–44.Google Scholar
Schreiber, M., 2013. A case study of the arbitrariness of the h-index and the highly-cited-publications indicator. Journal of Informetrics 7: 379–87.Google Scholar
Sebastiani, F., 2002. Machine learning in automated text categorization. ACM Computing Surveys 34: 147.CrossRefGoogle Scholar
Siegel, D., and Baveye, P., 2010. Battling the paper glut. Science 329: 1466.Google Scholar
Small, H., 1973. Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science 24: 265–69.Google Scholar
Small, H., 2011. Interpreting maps of science using citation context sentiments: A preliminary investigation. Scientometrics 87: 373–88.Google Scholar
Sugiyama, K., Kumar, T., Kan, M.-Y., and Tripathi, R. C., 2010. Identifying citing sentences in research papers using supervised learning. In 2010 International Conference on Information Retrieval and Knowledge Management (CAMP), Shah Alam, Selangor, Malaysia, pp. 6772.Google Scholar
Teufel, S. 1999. Argumentative zoning: Information extraction from scientific text. Doctoral dissertation, School of Cognitive Science, University of Edinburgh, UK. Available at http://www.cl.cam.ac.uk/~sht25/thesis/t1.pdfGoogle Scholar
Teufel, S. 2010. The structure of scientific articles: Applications to citation indexing and summarization. CLSI–Studies in Computational Linguistics, Chicago: University of Chicago Press.Google Scholar
Teufel, S., and Moens, M. 1999. Discourse-level argumentation in scientific articles: Human and automatic annotation. In Towards Standards and Tools for Discourse Tagging: Proceedings of the Workshop, ACL. Somerset, NJ, USA, pp 8493.Google Scholar
Teufel, S., Siddharthan, A., and Tidhar, D. 2006 July. Automatic classification of citation function. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, ACL. Stroudsburg, PA, pp. 103–10.Google Scholar
Teufel, S., Siddharthan, A., and Tidhar, D. 2009. An annotation scheme for citation function. In Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue, ACL. Stroudsburg, PA, pp. 80−7.Google Scholar
Tsai, C.-T., Kundu, G., and Roth, D., 2013. Concept-based analysis of scientific literature. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management - CIKM ‘13, ACM Press. New York, NY, USA, pp. 1733–38.Google Scholar
Van Noorden, R., 2013. Brazilian citation scheme outed. Nature 500 (7464): 510–11.Google Scholar
Verlic, M., Stiglic, G., Kocbek, S., and Kokol, P., 2008. Sentiment in Science – A Case Study of CBMS Contributions in Years 2003 to 2007. In 21st IEEE International Symposium on Computer-Based Medical Systems, University of Jyväskylä, Finland, pp. 138–43.Google Scholar
Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., and Patwardhan, S., 2005. OpinionFinder. In Proceedings of HLT/EMNLP on Interactive Demonstrations, ACL. Morristown, NJ, USA, pp. 34–5.Google Scholar
Yan, R., Tang, J., Liu, X., Shan, D., and Li, X., 2011. Citation count prediction: Learning to estimate future citations for literature. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, ACM. Glasgow, UK, pp. 1247–52.Google Scholar
Young, N. S., Ioannidis, J. P. A., and Al-Ubaydli, O., 2008. Why current publication practices may distort science. PLoS Medicine 5 (10): e201.Google Scholar
Zhang, G., Ding, Y., and Milojević, S., 2013. Citation content analysis (cca): A framework for syntactic and semantic analysis of citation content. Journal of the American Society for Information Science and Technology 64 (7): 1490–503.Google Scholar
Zhang, W., Yu, C., and Meng, W., 2007. Opinion retrieval from blogs. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management - CIKM ‘07, ACM Press. New York, NY, USA, p. 831.CrossRefGoogle Scholar
Zhu, X., Turney, P., Lemire, D., and Vellino, A., 2014. Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology 66 (2): 408–27.CrossRefGoogle Scholar
Ziman, J. M. 1987. An Introduction to Science Studies: The Philosophical and Social Aspects of Science and Technology, Cambridge: Cambridge University Press.Google Scholar