Hostname: page-component-8448b6f56d-qsmjn Total loading time: 0 Render date: 2024-04-18T19:38:01.184Z Has data issue: false hasContentIssue false

NLP-driven citation analysis for scientometrics

Published online by Cambridge University Press:  25 January 2016

RAHUL JHA
Affiliation:
Microsoft Corp., Redmond, WA, USA e-mails: rajh@microsoft.com, amjada@microsoft.com
AMJAD-ABU JBARA
Affiliation:
Microsoft Corp., Redmond, WA, USA e-mails: rajh@microsoft.com, amjada@microsoft.com
VAHED QAZVINIAN
Affiliation:
University of Michigan, Ann Arbor, MI, USA e-mail: vahed@umich.edu
DRAGOMIR R. RADEV
Affiliation:
EECS and SI, University of Michigan, Ann Arbor, MI, USA e-mail: radev@umich.edu

Abstract

This paper summarizes ongoing research in Natural-Language-Processing-driven citation analysis and describes experiments and motivating examples of how this work can be used to enhance traditional scientometrics analysis that is based on simply treating citations as a ‘vote’ from the citing paper to cited paper. In particular, we describe our dataset for citation polarity and citation purpose, present experimental results on the automatic detection of these indicators, and demonstrate the use of such annotations for studying research dynamics and scientific summarization. We also look at two complementary problems that show up in Natural-Language-Processing-driven citation analysis for a specific target paper. The first problem is extracting citation context, the implicit citation sentences that do not contain explicit anchors to the target paper. The second problem is extracting reference scope, the target relevant segment of a complicated citing sentence that cites multiple papers. We show how these tasks can be helpful in improving sentiment analysis and citation-based summarization.

Type
Articles
Copyright
Copyright © Cambridge University Press 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abu-Jbara, A., Ezra, J., and Radev, D. R., 2013. Purpose and polarity of citation: Towards nlp-based bibliometrics. In HLT-NAACL, Atlanta, Georgia, USA, Association for Computational Linguistics, pp. 596606.Google Scholar
Abu-Jbara, A., and Radev, D., 2011. Coherent citation-based summarization of scientific papers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, Association for Computational Linguistics, pp. 500–9.Google Scholar
Abu Jbara, A., and Radev, D., 2012. Reference scope identification in citing sentences. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montréal, Canada, Association for Computational Linguistics, pp. 8090.Google Scholar
Angrosh, M. A., Cranefield, S., and Stanger, N., 2013. Conditional random field based sentence context identification: enhancing citation services for the research community. In Proceedings of the First Australasian Web Conference - Volume 144, AWC ’13, Darlinghurst, Australia, Australia, Australian Computer Society, Inc, pp. 5968.Google Scholar
Athar, A., 2011. Sentiment analysis of citations using sentence structure-based features. In Proceedings of the ACL 2011 Student Session, Portland, OR, USA, Association for Computational Linguistics, pp. 81–7.Google Scholar
Athar, A., and Teufel, S., 2012a. Detection of implicit citations for sentiment detection. In Proceedings of the Workshop on Detecting Structure in Scholarly Discourse, Jeju Island, Korea, Association for Computational Linguistics, pp. 1826.Google Scholar
Athar, A., and Teufel, S., 2012b. Context-enhanced citation sentiment detection. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT ’12, Montréal, Canada, Association for Computational Linguistics, pp. 597601.Google Scholar
Bergstrom, C. E., 2007. Measuring the value and prestige of scholarly journals. Coll Res Libr News 68 (5): 314316.CrossRefGoogle Scholar
Bergstrom, C. T., West, J. D., and Wiseman, M. A., 2008. The EigenfactorTM metrics. Journal of Neuroscience 28 (45): 11433–4.CrossRefGoogle Scholar
Biber, D. 1988. Variation Across Speech and Writing. Cambridge, Cambridge University Press.Google Scholar
Bletsas, A., and Sahalos, J. N., 2009. Hirsch index rankings require scaling and higher moment. Journal of the American Society for Information Science and Technology 60 (12): 2577–86.Google Scholar
Bonzi, S., 1982. Characteristics of a literature as predictors of relatedness between cited and citing works. Journal of the American Society for Information Science 33 (4): 208–16.CrossRefGoogle Scholar
Bonzi, S., and Snyder, H. W., 1991. Motivations for citation: a comparison of self citation and citation to others. Scientometrics 21 (2): 245–54.Google Scholar
Bornmann, L., and Marx, W. 2013. Standards for the application of bibliometrics in the evaluation of individual researchers working in the natural sciences. ArXiv e-prints.Google Scholar
Bornmann, L., and Marx, W., 2014. The wisdom of citing scientists. Journal of the Association for Information Science and Technology 65 (6): 1288–92.CrossRefGoogle Scholar
Bradshaw, S. 2003. Reference directed indexing: redeeming relevance for subject search in citation indexes. In Proceedings of the 7th European Conference on Research and Advanced Technology for Digital Libraries, Trondheim, Norway.Google Scholar
Braun, T., Bujdosó, E., and Schubert, A. 1987. Literature of Analytical Chemistry: A Scientometric Evaluation. Boca Raton, FL: CRC Press.Google Scholar
Braun, T., Glänzel, W., and Schubert, A., 2006. A hirsch-type index for journals. Scientometrics 69 (1): 169173.Google Scholar
Brody, T., Harnad, S., and Carr, L., 2006. Earlier web usage statistics as predictors of later citation impact. Journal of the American Society for Information Science and Technology 57 (8): 1060–72.Google Scholar
Bunescu, R., and Mooney, R., 2005. A shortest path dependency kernel for relation extraction. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada, Association for Computational Linguistics, pp. 724–31.Google Scholar
Cheang, B., Chu, S. K. W., Li, C., and Lim, A., 2014. A multidimensional approach to evaluating management journals: refining pagerank via the differentiation of citation types and identifying the roles that management journals play. Journal of the Association for Information Science and Technology 65 (12): 2581–91.CrossRefGoogle Scholar
Chubin, D. E., and Moitra, S. D. 1975. Content analysis of references: Adjunct or alternative to citation counting? Social Studies of Science 5 (4): 423–41.Google Scholar
Church, K. W. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing, pp. 136–43, Austin, Texas, USA. Association for Computational Linguistics.Google Scholar
Cohen, J., 1968. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin 70 (4): 213–20.Google Scholar
Cormode, G., Ma, Q., Muthukrishnan, S., and Thompson, B., 2013. Socializing the h-index. Journal of Informetrics 7 (3): 718–21.CrossRefGoogle Scholar
Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., and Zhai, C., 2014. Content-based citation analysis: The next generation of citation analysis. Journal of the Association for Information Science and Technology 65 (9): 1820–33.Google Scholar
Egghe, L., 2014. A good normalized impact and concentration measure. Journal of the Association for Information Science and Technology 65 (10): 2152–54.Google Scholar
Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., and Radev, D. 2008. Blind men and elephants: What do citation summaries tell us about a research article? J. Am. Soc. Inf. Sci. Technol. 59 (1): 5162.CrossRefGoogle Scholar
Erkan, G., and Radev, D. R. 2004. Lexrank: Graph-based centrality as salience in text summarization. Journal of Artificial Intelligence Research (JAIR), 22:457–79.Google Scholar
Eysenbach, G. 2011. Can tweets predict citations? metrics of social impact based on twitter and correlation with traditional metrics of scientific impact. Journal of Medical Internet Research, 13 (4).CrossRefGoogle ScholarPubMed
Ferrara, E., and Romero, A. E., 2013. Scientific impact evaluation and the effect of self-citations: Mitigating the bias by discounting the h-index. Journal of the American Society for Information Science and Technology 64 (11): 2332–39.Google Scholar
Frandsen, T. F., and Nicolaisen, J., 2013. The ripple effect: citation chain reactions of a nobel prize. Journal of the American Society for Information Science and Technology 64 (3): 437–47.Google Scholar
Garfield, E. 1964. Can citation indexing be automated? Statistical Assoc. Methods for Mechanized Documentation, Symposium Proceedings. Washington, US.Google Scholar
Garfield, E. 2006. Citation indexes for science. a new dimension in documentation through association of ideas. International Journal of Epidemiology 35 (5):1123–27.Google Scholar
Garfield, E., Sher, I. H., and Torpie, R. J. 1984. The Use of Citation Data in Writing the History of Science. Institute for Scientific Information Inc., Philadelphia, Pennsylvania, USA.Google Scholar
Gorraiz, J., Gumpenberger, C., and Schlögl, C., 2014. Usage versus citation behaviours in four subject areas. Scientometrics 101 (2): 1077–95.CrossRefGoogle Scholar
Halevi, G., and Moed, H. F., 2013. The thematic and conceptual flow of disciplinary research: A citation context analysis of the journal of informetrics, 2007. Journal of the American Society for Information Science and Technology 64 (9): 19031913.Google Scholar
Haustein, S., Peters, I., Sugimoto, C. R., Thelwall, M., and Larivière, V., 2014. Tweeting biomedicine: An analysis of tweets and citations in the biomedical literature. Journal of the Association for Information Science and Technology 65 (4): 656–69.Google Scholar
Heneberg, P., 2013. Lifting the fog of scientometric research artifacts: On the scientometric analysis of environmental tobacco smoke research. Journal of the American Society for Information Science and Technology 64 (2): 334–44.Google Scholar
Hodges, T. L. 1972. Citation Indexing-its Theory and Application in Science, Technology, and Humanities. Ph.D. Thesis, University of California at Berkeley.Google Scholar
Hou, W.-R., Li, M., and Niu, D.-K., 2011. Counting citations in texts rather than reference lists to improve the accuracy of assessing scientific contribution: citation frequency of individual articles in other papers more fairly measures their scientific contribution than mere presence in refere. BioEssays : News and Reviews in Molecular, Cellular and Developmental Biology 33 (10): 724–7.Google Scholar
Jonkers, K., Derrick, G. E., Lopez-Illescas, C., and Van den Besselaar, P. 2014. Measuring the scientific impact of e-research infrastructures: a citation based approach? Scientometrics 101 (2): 1179–94.Google Scholar
Kaplan, D., Iida, R., and Tokunaga, T., 2009. Automatic extraction of citation contexts for research paper summarization: A coreference-chain based approach. In Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, Suntec City, Singapore, Association for Computational Linguistics, pp. 8895.Google Scholar
Kim, H. D., and Zhai, C., 2009. Generating comparative summaries of contradictory opinions in text. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09, New York, NY, USA, ACM, pp. 385394.CrossRefGoogle Scholar
Klosik, D. F., and Bornholdt, S. 2014. The citation wake of publications detects Nobel laureates’ papers. PLoS ONE 9 (12): e113184. doi: 10.1371/journal.pone.0113184.Google Scholar
Kostoff, R. N., del Rio, J. A., Humenik, J. A., Garcia, E. O., and Ramirez, A. M., 2001. Citation mining: Integrating text mining and bibliometrics for research user profiling. Journal of the American Society for Information Science and Technology 52 (13): 1148–56.Google Scholar
Lafferty, J. D., McCallum, A., and Pereira, F. C. N., 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc, pp. 282–89.Google Scholar
Landis, J. R., and Koch, G. G., 1977. The measurement of observer agreement for categorical data. Biometrics 33 (1): 159174.Google Scholar
Li, R., Chambers, T., Ding, Y., Zhang, G., and Meng, L., 2014. Patent citation analysis: calculating science linkage based on citing motivation. Journal of the Association for Information Science and Technology 65 (5): 1007–17.CrossRefGoogle Scholar
Liu, J. S., Chen, H.-H., Ho, M. H.-C., and Li, Y.-C., 2014a. Citations with different levels of relevancy: tracing the main paths of legal opinions. Journal of the Association for Information Science and Technology 65 (12): 2479–88.Google Scholar
Liu, S., Chen, C., Ding, K., Wang, B., Xu, K., and Lin, Y., 2014b. Literature retrieval based on citation context. Scientometrics 101 (2): 1293–307.CrossRefGoogle Scholar
Liu, Y., and Rousseau, R., 2014. Citation analysis and the development of science: a case study using articles by some Nobel prize winners. Journal of the Association for Information Science and Technology 65 (2): 281–9.CrossRefGoogle Scholar
MacRoberts, M. H., and MacRoberts, B. R., 1984. The negational reference: Or the art of dissembling. Social Studies of Science 14 (1): 91–4.Google Scholar
Magerman, D. M., 1995. Statistical decision-tree models for parsing. In Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, ACL ’95, Cambridge, Massachusetts, Association for Computational Linguistics, pp. 276–83.CrossRefGoogle Scholar
Milard, B., 2014. The social circles behind scientific references: relationships between citing and cited authors in chemistry publications. Journal of the Association for Information Science and Technology 65 (12): 2459–68.Google Scholar
Mohammad, S., Dorr, B., Egan, M., Hassan, A., Muthukrishan, P., Qazvinian, V., Radev, D., and Zajic, D., 2009. Using citations to generate surveys of scientific paradigms. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL ’09, Boulder, Colorado, Association for Computational Linguistics, pp. 584–92.Google Scholar
Morante, R., and Blanco, E., 2012. *sem 2012 shared task: resolving the scope and focus of negation. In Proceedings of the 1st Joint Conference on Lexical and Computational Semantics, Montréal, Canada, Association for Computational Linguistics, pp. 265–74.Google Scholar
Moravcsik, M. J., and Murugesan, P., 1975. Some results on the function and quality of citations. Social Studies of Science 5 : 8692.CrossRefGoogle Scholar
Nakov, P. I., Schwartz, A. S., and Hearst, M. A. 2004. Citances: citation sentences for semantic analysis of bioscience text. In Proceedings of the SIGIR’04 workshop on Search and Discovery in Bioinformatics, Sheffield, UK.Google Scholar
Nanba, H., Kando, N., and Okumura, M., 2004. Classification of research papers using citation links and citation types: towards automatic review article generation. In Proceedings of the 11th SIG Classification Research Workshop, Chicago, USA, pp. 117–34.Google Scholar
Nanba, H., and Okumura, M., 1999. Towards multi-paper summarization using reference information. In IJCAI ’99: Proceedings of the 16th International Joint Conference on Artificial Intelligence, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc, pp. 926–31.Google Scholar
Nenkova, A., and Passonneau, R. 2004. Evaluating content selection in summarization: the pyramid method. In Proceedings of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (HLT-NAACL ’04), Boston, Massachusetts.Google Scholar
Och, F. J., and Ney, H., 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29 (1): 1951.Google Scholar
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J., 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, Association for Computational Linguistics, pp. 311–18.Google Scholar
Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., and Webber, B. 2008. The penn discourse treebank 2.0. In Proceedings of LREC, 2008, Marrakech, Morocco.Google Scholar
Prathap, G., 2014. A three-class, three-dimensional bibliometric performance indicator. Journal of the Association for Information Science and Technology 65 (7): 1506–8.Google Scholar
Qazvinian, V., and Radev, D. R., 2008. Scientific paper summarization using citation summary networks. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, Coling 2008 Organizing Committee, pp. 689–96.Google Scholar
Qazvinian, V., and Radev, D. R., 2010. Identifying non-explicit citing sentences for citation-based summarization. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, Association for Computational Linguistics, pp. 555–64.Google Scholar
Qazvinian, V., Radev, D. R., Mohammad, S. M., Dorr, B., Zajic, D., Whidby, M., and Moon, T. 2013. Generating extractive summaries of scientific paradigms. J. Artif. Int. Res. 46 (1): 165201. El Segundo, CA, USA.Google Scholar
Qazvinian, V., Radev, D. R., and Özgür, A., 2010. Citation summarization through keyphrase extraction. In Proceedings of the 23nd International Conference on Computational Linguistics (COLING-10), Beijing, China, pp. 895903.Google Scholar
Quirk, R., Greenbaum, S., Leech, G., and Svartvik, J., 1985. A Comprehensive Grammar of the English Language. London: Longman.Google Scholar
Radev, D. R., Muthukrishnan, P., Qazvinian, V., and Abu-Jbara, A., 2013. The acl anthology network corpus. Language Resources and Evaluation 47 (4): 919–44.Google Scholar
Radicchi, F., and Castellano, C., 2013. Analysis of bibliometric indicators for individual scholars in a large data set. Scientometrics 97 (3): 627–37.CrossRefGoogle Scholar
Rafols, I., and Meyer, M., 2009. Diversity and network coherence as indicators of interdisciplinarity: case studies in bionanoscience. Scientometrics 82 (2): 263–87.Google Scholar
Shen, J., Yao, L., Li, Y., Clarke, M., Wang, L., and Li, D., 2013. Visualizing the history of evidence-based medicine: a bibliometric analysis. Journal of the American Society for Information Science and Technology 64 (10): 2157–72.Google Scholar
Small, H. 1982. Citation context analysis. In Progress in Communication Sciences 3: 287310.Google Scholar
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning C. D., Ng, A. Y., and Potts, C., 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington. Association for Computational Linguistics, pp. 16311642.Google Scholar
Spiegel-Rösing, I., 1977. Science studies: bibliometric and content analysis. Social Studies of Science 7 (1): 97113.Google Scholar
Surowiecki, J. 2004. The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. Doubleday.Google Scholar
Swales, J. M. 1990. Genre Analysis: English in Academic and Research Settings. Cambridge Applied Linguistics. Cambridge, Cambridge University Press.Google Scholar
Teufel, S. 2006. Argumentative zoning for improved citation indexing. In Shanahan, J. G., Qu, Y., and Wiebe, J. (eds.), Computing attitude and affect in text: Theory and Applications, 20:159–69. Springer Netherlands. http://dx.doi.org/10.1007/1-4020-4102-0_13 Google Scholar
Teufel, S., Siddharthan, A., and Tidhar, D. 2006. Automatic classification of citation function. In Proceedings of EMNLP-06, Sydney, Australia.CrossRefGoogle Scholar
Thelwall, M., Haustein, S., Larivière, V., and Sugimoto, C. R., 2013. Do altmetrics work? twitter and ten other social web services. PLoS ONE 8 (5): e64841.Google Scholar
Thompson, G., and Yiyun, Y., 1991. Evaluation in the reporting verbs used in academic papers. Applied Linguistics 12 (4): 365–82.Google Scholar
Velden, T., and Lagoze, C., 2013. The extraction of community structures from publication networks to support ethnographic observations of field differences in scientific communication. Journal of the American Society for Information Science and Technology 64 (12): 2405–27.Google Scholar
Vinkler, P. 2010. The Evaluation of Research by Scientometric Indicators, pp. 13. Chandos Learning and Teaching Series. Oxfordshire, United Kingdom: Chandos Publishing.CrossRefGoogle Scholar
Waltman, L., van Eck, N. J., and Wouters, P. 2013. Counting publications and citations: Is more always better? Journal of Informetrics 7 (3): 635–41, ISSN , http://dx.doi.org/10.1016/j.joi.2013.04.001.CrossRefGoogle Scholar
Wan, X., and Liu, F., 2014a. WL-index: leveraging citation mention number to quantify an individual’s scientific impact. Journal of the Association for Information Science and Technology 65 (12): 2509–17.CrossRefGoogle Scholar
Wan, X., and Liu, F., 2014b. Are all literature citations equally important? Automatic citation strength estimation and its applications. Journal of the Association for Information Science and Technology 65 (9): 1929–38.Google Scholar
Weinstock, M. 1971. Citation Indexes, Kent, A. (ed.), vol. 5, Encyclopedia of Library and Information Science. New York: Marcel Dekker.Google Scholar
White, H. D., 2004. Citation analysis and discourse analysis revisited. Applied Linguistics 25 (1): 89116.Google Scholar
Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., and Patwardhan, S., 2005. Opinionfinder: a system for subjectivity analysis. In Proceedings of HLT/EMNLP on Interactive Demonstrations, HLT-Demo ’05, Vancouver, B.C., Canada, Association for Computational Linguistics, pp. 3435.Google Scholar
Yarowsky, D., 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, ACL ’95, Cambridge, Massachusetts, Association for Computational Linguistics, pp. 189–96.Google Scholar
Yin, X., Huang, J. X., and Li, Z., 2011. Mining and modeling linkage information from citation context for improving biomedical literature retrieval. Information Processing & Management 47 (1): 5367.Google Scholar
Zhang, C.-T. 2009. The e-index, complementing the h-index for excess citations. PLoS ONE 4 (5): e5429+.CrossRefGoogle ScholarPubMed
Zhao, D., and Strotmann, A., 2014. In-text author citation analysis: feasibility, benefits, and limitations. Journal of the Association for Information Science and Technology 65 (11): 2348–58.CrossRefGoogle Scholar
Ziman, J. M., 1968. Public Knowledge: An Essay Concerning the Social Dimension of Science. Cambridge, England, UK: Cambridge University Press.Google Scholar
Zitt, M., and Cointet, J.-P. 2013. Citation impacts revisited: how novel impact measures reflect interdisciplinarity and structural change at the local and global level. ArXiv e-prints.Google Scholar