NLP-driven citation analysis for scientometrics

RAHUL JHA; AMJAD-ABU JBARA; VAHED QAZVINIAN; DRAGOMIR R. RADEV

doi:10.1017/S1351324915000443

NLP-driven citation analysis for scientometrics

Published online by Cambridge University Press: 25 January 2016

RAHUL JHA ,

AMJAD-ABU JBARA ,

VAHED QAZVINIAN and

DRAGOMIR R. RADEV

Show author details

RAHUL JHA: Affiliation:
Microsoft Corp., Redmond, WA, USA e-mails: rajh@microsoft.com, amjada@microsoft.com
AMJAD-ABU JBARA: Affiliation:
Microsoft Corp., Redmond, WA, USA e-mails: rajh@microsoft.com, amjada@microsoft.com
VAHED QAZVINIAN: Affiliation:
University of Michigan, Ann Arbor, MI, USA e-mail: vahed@umich.edu
DRAGOMIR R. RADEV: Affiliation:
EECS and SI, University of Michigan, Ann Arbor, MI, USA e-mail: radev@umich.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper summarizes ongoing research in Natural-Language-Processing-driven citation analysis and describes experiments and motivating examples of how this work can be used to enhance traditional scientometrics analysis that is based on simply treating citations as a ‘vote’ from the citing paper to cited paper. In particular, we describe our dataset for citation polarity and citation purpose, present experimental results on the automatic detection of these indicators, and demonstrate the use of such annotations for studying research dynamics and scientific summarization. We also look at two complementary problems that show up in Natural-Language-Processing-driven citation analysis for a specific target paper. The first problem is extracting citation context, the implicit citation sentences that do not contain explicit anchors to the target paper. The second problem is extracting reference scope, the target relevant segment of a complicated citing sentence that cites multiple papers. We show how these tasks can be helpful in improving sentiment analysis and citation-based summarization.

Type: Articles
Information: Natural Language Engineering , Volume 23 , Issue 1 , January 2017 , pp. 93 - 130

DOI: https://doi.org/10.1017/S1351324915000443 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abu-Jbara, A., Ezra, J., and Radev, D. R., 2013. Purpose and polarity of citation: Towards nlp-based bibliometrics. In HLT-NAACL, Atlanta, Georgia, USA, Association for Computational Linguistics, pp. 596–606.Google Scholar

Abu-Jbara, A., and Radev, D., 2011. Coherent citation-based summarization of scientific papers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, Association for Computational Linguistics, pp. 500–9.Google Scholar

Abu Jbara, A., and Radev, D., 2012. Reference scope identification in citing sentences. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Montréal, Canada, Association for Computational Linguistics, pp. 80–90.Google Scholar

Angrosh, M. A., Cranefield, S., and Stanger, N., 2013. Conditional random field based sentence context identification: enhancing citation services for the research community. In Proceedings of the First Australasian Web Conference - Volume 144, AWC ’13, Darlinghurst, Australia, Australia, Australian Computer Society, Inc, pp. 59–68.Google Scholar

Athar, A., 2011. Sentiment analysis of citations using sentence structure-based features. In Proceedings of the ACL 2011 Student Session, Portland, OR, USA, Association for Computational Linguistics, pp. 81–7.Google Scholar

Athar, A., and Teufel, S., 2012a. Detection of implicit citations for sentiment detection. In Proceedings of the Workshop on Detecting Structure in Scholarly Discourse, Jeju Island, Korea, Association for Computational Linguistics, pp. 18–26.Google Scholar

Athar, A., and Teufel, S., 2012b. Context-enhanced citation sentiment detection. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT ’12, Montréal, Canada, Association for Computational Linguistics, pp. 597–601.Google Scholar

Bergstrom, C. E., 2007. Measuring the value and prestige of scholarly journals. Coll Res Libr News 68 (5): 314–316.CrossRef Google Scholar

Bergstrom, C. T., West, J. D., and Wiseman, M. A., 2008. The EigenfactorTM metrics. Journal of Neuroscience 28 (45): 11433–4.CrossRef Google Scholar

Biber, D. 1988. Variation Across Speech and Writing. Cambridge, Cambridge University Press.Google Scholar

Bletsas, A., and Sahalos, J. N., 2009. Hirsch index rankings require scaling and higher moment. Journal of the American Society for Information Science and Technology 60 (12): 2577–86.Google Scholar

Bonzi, S., 1982. Characteristics of a literature as predictors of relatedness between cited and citing works. Journal of the American Society for Information Science 33 (4): 208–16.CrossRef Google Scholar

Bonzi, S., and Snyder, H. W., 1991. Motivations for citation: a comparison of self citation and citation to others. Scientometrics 21 (2): 245–54.Google Scholar

Bornmann, L., and Marx, W. 2013. Standards for the application of bibliometrics in the evaluation of individual researchers working in the natural sciences. ArXiv e-prints.Google Scholar

Bornmann, L., and Marx, W., 2014. The wisdom of citing scientists. Journal of the Association for Information Science and Technology 65 (6): 1288–92.CrossRef Google Scholar

Bradshaw, S. 2003. Reference directed indexing: redeeming relevance for subject search in citation indexes. In Proceedings of the 7th European Conference on Research and Advanced Technology for Digital Libraries, Trondheim, Norway.Google Scholar

Braun, T., Bujdosó, E., and Schubert, A. 1987. Literature of Analytical Chemistry: A Scientometric Evaluation. Boca Raton, FL: CRC Press.Google Scholar

Braun, T., Glänzel, W., and Schubert, A., 2006. A hirsch-type index for journals. Scientometrics 69 (1): 169–173.Google Scholar

Brody, T., Harnad, S., and Carr, L., 2006. Earlier web usage statistics as predictors of later citation impact. Journal of the American Society for Information Science and Technology 57 (8): 1060–72.Google Scholar

Bunescu, R., and Mooney, R., 2005. A shortest path dependency kernel for relation extraction. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada, Association for Computational Linguistics, pp. 724–31.Google Scholar

Cheang, B., Chu, S. K. W., Li, C., and Lim, A., 2014. A multidimensional approach to evaluating management journals: refining pagerank via the differentiation of citation types and identifying the roles that management journals play. Journal of the Association for Information Science and Technology 65 (12): 2581–91.CrossRef Google Scholar

Chubin, D. E., and Moitra, S. D. 1975. Content analysis of references: Adjunct or alternative to citation counting? Social Studies of Science 5 (4): 423–41.Google Scholar

Church, K. W. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing, pp. 136–43, Austin, Texas, USA. Association for Computational Linguistics.Google Scholar

Cohen, J., 1968. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin 70 (4): 213–20.Google Scholar

Cormode, G., Ma, Q., Muthukrishnan, S., and Thompson, B., 2013. Socializing the h-index. Journal of Informetrics 7 (3): 718–21.CrossRef Google Scholar

Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., and Zhai, C., 2014. Content-based citation analysis: The next generation of citation analysis. Journal of the Association for Information Science and Technology 65 (9): 1820–33.Google Scholar

Egghe, L., 2014. A good normalized impact and concentration measure. Journal of the Association for Information Science and Technology 65 (10): 2152–54.Google Scholar

Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., and Radev, D. 2008. Blind men and elephants: What do citation summaries tell us about a research article? J. Am. Soc. Inf. Sci. Technol. 59 (1): 51–62.CrossRef Google Scholar

Erkan, G., and Radev, D. R. 2004. Lexrank: Graph-based centrality as salience in text summarization. Journal of Artificial Intelligence Research (JAIR), 22:457–79.Google Scholar

Eysenbach, G. 2011. Can tweets predict citations? metrics of social impact based on twitter and correlation with traditional metrics of scientific impact. Journal of Medical Internet Research, 13 (4).CrossRef Google Scholar PubMed

Ferrara, E., and Romero, A. E., 2013. Scientific impact evaluation and the effect of self-citations: Mitigating the bias by discounting the h-index. Journal of the American Society for Information Science and Technology 64 (11): 2332–39.Google Scholar

Frandsen, T. F., and Nicolaisen, J., 2013. The ripple effect: citation chain reactions of a nobel prize. Journal of the American Society for Information Science and Technology 64 (3): 437–47.Google Scholar

Garfield, E. 1964. Can citation indexing be automated? Statistical Assoc. Methods for Mechanized Documentation, Symposium Proceedings. Washington, US.Google Scholar

Garfield, E. 2006. Citation indexes for science. a new dimension in documentation through association of ideas. International Journal of Epidemiology 35 (5):1123–27.Google Scholar

Garfield, E., Sher, I. H., and Torpie, R. J. 1984. The Use of Citation Data in Writing the History of Science. Institute for Scientific Information Inc., Philadelphia, Pennsylvania, USA.Google Scholar

Gorraiz, J., Gumpenberger, C., and Schlögl, C., 2014. Usage versus citation behaviours in four subject areas. Scientometrics 101 (2): 1077–95.CrossRef Google Scholar

Halevi, G., and Moed, H. F., 2013. The thematic and conceptual flow of disciplinary research: A citation context analysis of the journal of informetrics, 2007. Journal of the American Society for Information Science and Technology 64 (9): 1903–1913.Google Scholar

Haustein, S., Peters, I., Sugimoto, C. R., Thelwall, M., and Larivière, V., 2014. Tweeting biomedicine: An analysis of tweets and citations in the biomedical literature. Journal of the Association for Information Science and Technology 65 (4): 656–69.Google Scholar

Heneberg, P., 2013. Lifting the fog of scientometric research artifacts: On the scientometric analysis of environmental tobacco smoke research. Journal of the American Society for Information Science and Technology 64 (2): 334–44.Google Scholar

Hodges, T. L. 1972. Citation Indexing-its Theory and Application in Science, Technology, and Humanities. Ph.D. Thesis, University of California at Berkeley.Google Scholar

Hou, W.-R., Li, M., and Niu, D.-K., 2011. Counting citations in texts rather than reference lists to improve the accuracy of assessing scientific contribution: citation frequency of individual articles in other papers more fairly measures their scientific contribution than mere presence in refere. BioEssays : News and Reviews in Molecular, Cellular and Developmental Biology 33 (10): 724–7.Google Scholar

Jonkers, K., Derrick, G. E., Lopez-Illescas, C., and Van den Besselaar, P. 2014. Measuring the scientific impact of e-research infrastructures: a citation based approach? Scientometrics 101 (2): 1179–94.Google Scholar

Kaplan, D., Iida, R., and Tokunaga, T., 2009. Automatic extraction of citation contexts for research paper summarization: A coreference-chain based approach. In Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, Suntec City, Singapore, Association for Computational Linguistics, pp. 88–95.Google Scholar

Kim, H. D., and Zhai, C., 2009. Generating comparative summaries of contradictory opinions in text. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09, New York, NY, USA, ACM, pp. 385–394.CrossRef Google Scholar

Klosik, D. F., and Bornholdt, S. 2014. The citation wake of publications detects Nobel laureates’ papers. PLoS ONE 9 (12): e113184. doi: 10.1371/journal.pone.0113184.Google Scholar

Kostoff, R. N., del Rio, J. A., Humenik, J. A., Garcia, E. O., and Ramirez, A. M., 2001. Citation mining: Integrating text mining and bibliometrics for research user profiling. Journal of the American Society for Information Science and Technology 52 (13): 1148–56.Google Scholar

Lafferty, J. D., McCallum, A., and Pereira, F. C. N., 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc, pp. 282–89.Google Scholar

Landis, J. R., and Koch, G. G., 1977. The measurement of observer agreement for categorical data. Biometrics 33 (1): 159–174.Google Scholar

Li, R., Chambers, T., Ding, Y., Zhang, G., and Meng, L., 2014. Patent citation analysis: calculating science linkage based on citing motivation. Journal of the Association for Information Science and Technology 65 (5): 1007–17.CrossRef Google Scholar

Liu, J. S., Chen, H.-H., Ho, M. H.-C., and Li, Y.-C., 2014a. Citations with different levels of relevancy: tracing the main paths of legal opinions. Journal of the Association for Information Science and Technology 65 (12): 2479–88.Google Scholar

Liu, S., Chen, C., Ding, K., Wang, B., Xu, K., and Lin, Y., 2014b. Literature retrieval based on citation context. Scientometrics 101 (2): 1293–307.CrossRef Google Scholar

Liu, Y., and Rousseau, R., 2014. Citation analysis and the development of science: a case study using articles by some Nobel prize winners. Journal of the Association for Information Science and Technology 65 (2): 281–9.CrossRef Google Scholar

MacRoberts, M. H., and MacRoberts, B. R., 1984. The negational reference: Or the art of dissembling. Social Studies of Science 14 (1): 91–4.Google Scholar

Magerman, D. M., 1995. Statistical decision-tree models for parsing. In Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, ACL ’95, Cambridge, Massachusetts, Association for Computational Linguistics, pp. 276–83.CrossRef Google Scholar

Milard, B., 2014. The social circles behind scientific references: relationships between citing and cited authors in chemistry publications. Journal of the Association for Information Science and Technology 65 (12): 2459–68.Google Scholar

Mohammad, S., Dorr, B., Egan, M., Hassan, A., Muthukrishan, P., Qazvinian, V., Radev, D., and Zajic, D., 2009. Using citations to generate surveys of scientific paradigms. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL ’09, Boulder, Colorado, Association for Computational Linguistics, pp. 584–92.Google Scholar

Morante, R., and Blanco, E., 2012. *sem 2012 shared task: resolving the scope and focus of negation. In Proceedings of the 1st Joint Conference on Lexical and Computational Semantics, Montréal, Canada, Association for Computational Linguistics, pp. 265–74.Google Scholar

Moravcsik, M. J., and Murugesan, P., 1975. Some results on the function and quality of citations. Social Studies of Science 5 : 86–92.CrossRef Google Scholar

Nakov, P. I., Schwartz, A. S., and Hearst, M. A. 2004. Citances: citation sentences for semantic analysis of bioscience text. In Proceedings of the SIGIR’04 workshop on Search and Discovery in Bioinformatics, Sheffield, UK.Google Scholar

Nanba, H., Kando, N., and Okumura, M., 2004. Classification of research papers using citation links and citation types: towards automatic review article generation. In Proceedings of the 11th SIG Classification Research Workshop, Chicago, USA, pp. 117–34.Google Scholar

Nanba, H., and Okumura, M., 1999. Towards multi-paper summarization using reference information. In IJCAI ’99: Proceedings of the 16th International Joint Conference on Artificial Intelligence, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc, pp. 926–31.Google Scholar

Nenkova, A., and Passonneau, R. 2004. Evaluating content selection in summarization: the pyramid method. In Proceedings of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (HLT-NAACL ’04), Boston, Massachusetts.Google Scholar

Och, F. J., and Ney, H., 2003. A systematic comparison of various statistical alignment models. Computational Linguistics 29 (1): 19–51.Google Scholar

Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J., 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, Association for Computational Linguistics, pp. 311–18.Google Scholar

Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., and Webber, B. 2008. The penn discourse treebank 2.0. In Proceedings of LREC, 2008, Marrakech, Morocco.Google Scholar

Prathap, G., 2014. A three-class, three-dimensional bibliometric performance indicator. Journal of the Association for Information Science and Technology 65 (7): 1506–8.Google Scholar

Qazvinian, V., and Radev, D. R., 2008. Scientific paper summarization using citation summary networks. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), Manchester, UK, Coling 2008 Organizing Committee, pp. 689–96.Google Scholar

Qazvinian, V., and Radev, D. R., 2010. Identifying non-explicit citing sentences for citation-based summarization. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, Association for Computational Linguistics, pp. 555–64.Google Scholar

Qazvinian, V., Radev, D. R., Mohammad, S. M., Dorr, B., Zajic, D., Whidby, M., and Moon, T. 2013. Generating extractive summaries of scientific paradigms. J. Artif. Int. Res. 46 (1): 165–201. El Segundo, CA, USA.Google Scholar

Qazvinian, V., Radev, D. R., and Özgür, A., 2010. Citation summarization through keyphrase extraction. In Proceedings of the 23nd International Conference on Computational Linguistics (COLING-10), Beijing, China, pp. 895–903.Google Scholar

Quirk, R., Greenbaum, S., Leech, G., and Svartvik, J., 1985. A Comprehensive Grammar of the English Language. London: Longman.Google Scholar

Radev, D. R., Muthukrishnan, P., Qazvinian, V., and Abu-Jbara, A., 2013. The acl anthology network corpus. Language Resources and Evaluation 47 (4): 919–44.Google Scholar

Radicchi, F., and Castellano, C., 2013. Analysis of bibliometric indicators for individual scholars in a large data set. Scientometrics 97 (3): 627–37.CrossRef Google Scholar

Rafols, I., and Meyer, M., 2009. Diversity and network coherence as indicators of interdisciplinarity: case studies in bionanoscience. Scientometrics 82 (2): 263–87.Google Scholar

Shen, J., Yao, L., Li, Y., Clarke, M., Wang, L., and Li, D., 2013. Visualizing the history of evidence-based medicine: a bibliometric analysis. Journal of the American Society for Information Science and Technology 64 (10): 2157–72.Google Scholar

Small, H. 1982. Citation context analysis. In Progress in Communication Sciences 3: 287–310.Google Scholar

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning C. D., Ng, A. Y., and Potts, C., 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington. Association for Computational Linguistics, pp. 1631–1642.Google Scholar

Spiegel-Rösing, I., 1977. Science studies: bibliometric and content analysis. Social Studies of Science 7 (1): 97–113.Google Scholar

Surowiecki, J. 2004. The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. Doubleday.Google Scholar

Swales, J. M. 1990. Genre Analysis: English in Academic and Research Settings. Cambridge Applied Linguistics. Cambridge, Cambridge University Press.Google Scholar

Teufel, S. 2006. Argumentative zoning for improved citation indexing. In Shanahan, J. G., Qu, Y., and Wiebe, J. (eds.), Computing attitude and affect in text: Theory and Applications, 20:159–69. Springer Netherlands. http://dx.doi.org/10.1007/1-4020-4102-0_13 Google Scholar

Teufel, S., Siddharthan, A., and Tidhar, D. 2006. Automatic classification of citation function. In Proceedings of EMNLP-06, Sydney, Australia.CrossRef Google Scholar

Thelwall, M., Haustein, S., Larivière, V., and Sugimoto, C. R., 2013. Do altmetrics work? twitter and ten other social web services. PLoS ONE 8 (5): e64841.Google Scholar

Thompson, G., and Yiyun, Y., 1991. Evaluation in the reporting verbs used in academic papers. Applied Linguistics 12 (4): 365–82.Google Scholar

Velden, T., and Lagoze, C., 2013. The extraction of community structures from publication networks to support ethnographic observations of field differences in scientific communication. Journal of the American Society for Information Science and Technology 64 (12): 2405–27.Google Scholar

Vinkler, P. 2010. The Evaluation of Research by Scientometric Indicators, pp. 1–3. Chandos Learning and Teaching Series. Oxfordshire, United Kingdom: Chandos Publishing.CrossRef Google Scholar

Waltman, L., van Eck, N. J., and Wouters, P. 2013. Counting publications and citations: Is more always better? Journal of Informetrics 7 (3): 635–41, ISSN , http://dx.doi.org/10.1016/j.joi.2013.04.001.CrossRef Google Scholar

Wan, X., and Liu, F., 2014a. WL-index: leveraging citation mention number to quantify an individual’s scientific impact. Journal of the Association for Information Science and Technology 65 (12): 2509–17.CrossRef Google Scholar

Wan, X., and Liu, F., 2014b. Are all literature citations equally important? Automatic citation strength estimation and its applications. Journal of the Association for Information Science and Technology 65 (9): 1929–38.Google Scholar

Weinstock, M. 1971. Citation Indexes, Kent, A. (ed.), vol. 5, Encyclopedia of Library and Information Science. New York: Marcel Dekker.Google Scholar

White, H. D., 2004. Citation analysis and discourse analysis revisited. Applied Linguistics 25 (1): 89–116.Google Scholar

Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., and Patwardhan, S., 2005. Opinionfinder: a system for subjectivity analysis. In Proceedings of HLT/EMNLP on Interactive Demonstrations, HLT-Demo ’05, Vancouver, B.C., Canada, Association for Computational Linguistics, pp. 34–35.Google Scholar

Yarowsky, D., 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, ACL ’95, Cambridge, Massachusetts, Association for Computational Linguistics, pp. 189–96.Google Scholar

Yin, X., Huang, J. X., and Li, Z., 2011. Mining and modeling linkage information from citation context for improving biomedical literature retrieval. Information Processing & Management 47 (1): 53–67.Google Scholar

Zhang, C.-T. 2009. The e-index, complementing the h-index for excess citations. PLoS ONE 4 (5): e5429+.CrossRef Google Scholar PubMed

Zhao, D., and Strotmann, A., 2014. In-text author citation analysis: feasibility, benefits, and limitations. Journal of the Association for Information Science and Technology 65 (11): 2348–58.CrossRef Google Scholar

Ziman, J. M., 1968. Public Knowledge: An Essay Concerning the Social Dimension of Science. Cambridge, England, UK: Cambridge University Press.Google Scholar

Zitt, M., and Cointet, J.-P. 2013. Citation impacts revisited: how novel impact measures reflect interdisciplinarity and structural change at the local and global level. ArXiv e-prints.Google Scholar

Article contents

NLP-driven citation analysis for scientometrics

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests