Hostname: page-component-7c8c6479df-27gpq Total loading time: 0 Render date: 2024-03-28T21:30:05.042Z Has data issue: false hasContentIssue false

Uncovering the structure and temporal dynamics of information propagation

Published online by Cambridge University Press:  03 April 2014

MANUEL GOMEZ RODRIGUEZ
Affiliation:
Department of Empirical Inference, MPI for Intelligent Systems, Tübingen, Baden-Wurttemberg, Germany (e-mail: manuelgr@tuebingen.mpg.de)
JURE LESKOVEC
Affiliation:
Department of Computer Science, Stanford University, Stanford, CA, USA (e-mail: jure@cs.stanford.edu)
DAVID BALDUZZI
Affiliation:
Machine Learning Laboratory, ETH Zürich, Zürich, Switzerland (e-mail: david.balduzzi@inf.ethz.ch)
BERNHARD SCHÖLKOPF
Affiliation:
Department of Empirical Inference, MPI for Intelligent Systems, Tübingen, Baden-Wurttemberg, Germany (e-mail: bs@tuebingen.mpg.de)

Abstract

Time plays an essential role in the diffusion of information, influence, and disease over networks. In many cases we can only observe when a node is activated by a contagion—when a node learns about a piece of information, makes a decision, adopts a new behavior, or becomes infected with a disease. However, the underlying network connectivity and transmission rates between nodes are unknown. Inferring the underlying diffusion dynamics is important because it leads to new insights and enables forecasting, as well as influencing or containing information propagation. In this paper we model diffusion as a continuous temporal process occurring at different rates over a latent, unobserved network that may change over time. Given information diffusion data, we infer the edges and dynamics of the underlying network. Our model naturally imposes sparse solutions and requires no parameter tuning. We develop an efficient inference algorithm that uses stochastic convex optimization to compute online estimates of the edges and transmission rates. We evaluate our method by tracking information diffusion among 3.3 million mainstream media sites and blogs, and experiment with more than 179 million different instances of information spreading over the network in a one-year period. We apply our network inference algorithm to the top 5,000 media sites and blogs and report several interesting observations. First, information pathways for general recurrent topics are more stable across time than for on-going news events. Second, clusters of news media sites and blogs often emerge and vanish in a matter of days for on-going news events. Finally, major events, for example, large scale civil unrest as in the Libyan civil war or Syrian uprising, increase the number of information pathways among blogs, and also increase the network centrality of blogs and social media sites.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2014 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aalen, O. O., Borgan, Ø., & Gjessing, H. K. (2008). Survival and event history analysis: A process point of view. New York, NY: Springer-Verlag.CrossRefGoogle Scholar
Adar, E., & Adamic, L. A. (2005). Tracking information epidemics in blogspace. In Proceedings of The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (pp. 207214). Washington, DC: IEEE.Google Scholar
Agarwal, A., & Duchi, J. C. (2011). Distributed delayed stochastic optimization. In Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., & Weinberger, K. (Eds.), Advances in Neural Information Processing Systems 24 (NIPS-24) (pp. 451459). NIPS Foundation.Google Scholar
Aral, S., & Walker, D. (2012). Identifying influential and susceptible members of social networks. Science, 337 (6092), 337341.CrossRefGoogle ScholarPubMed
Bach, F., & Moulines, E. (2011). Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., & Weinberger, K. (Eds.), Advances in Neural Information Processing Systems 24 (NIPS-24) (pp. 451459). NIPS Foundation.Google Scholar
Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509512.CrossRefGoogle ScholarPubMed
Blatt, D., Hero, A. O., & Gauchman, H. (2008). A convergent incremental gradient method with a constant step size. SIAM Journal on Optimization, 18 (1), 2951.CrossRefGoogle Scholar
Boyd, S. P., & Vandenberghe, L. (2004). Convex optimization. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Brockmann, D., Hufnagel, L., & Geisel, T. (2006). The scaling laws of human travel. Nature, 439 (7075), 462465.CrossRefGoogle ScholarPubMed
Chierichetti, F., Kleinberg, J., & Liben-Nowell, D. (2011). Reconstructing patterns of information diffusion from incomplete observations. In Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., & Weinberger, K. (Eds.), Advances in Neural Information Processing Systems 24 (NIPS-24) (pp. 792800). NIPS Foundation.Google Scholar
Clauset, A., Moore, C., & Newman, M. E. J. (2008). Hierarchical structure and the prediction of missing links in networks. Nature, 453 (7191), 98101.CrossRefGoogle ScholarPubMed
Du, N., Song, L., Gomez-Rodriguez, M., & Zha, H. (2013). Scalable influence estimation in continuous-time diffusion networks. In Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z., & Weinberger, K. Q. (Eds.), Advances in Neural Information Processing Systems 25 (NIPS-25) (pp. 27892797). NIPS Foundation.Google Scholar
Du, N., Song, L., Smola, A., & Yuan, M. (2012). Learning networks of heterogeneous influence. In Barlett, P., Pereira, F. C. N., Burges, C. J. C., Bottou, L., & Weinberger, K. Q. (Eds.), Advances in Neural Information Processing Systems 25 (NIPS-25) (pp. 27892797). NIPS Foundation.Google Scholar
Duchi, J. C., Agarwal, A., Johansson, M., & Jordan, M. I. (2011). Ergodic subgradient descent. Proceedings of the 40th Annual Allerton Conference on Communication, Control, and Computing (pp. 701706). IEEE.Google Scholar
Erdős, P., & Rényi, A. (1960). On the evolution of random graphs. Publication of the Mathematical Institute of the Hungarian Academy of Science Series B, 5, 1767.Google Scholar
Gomez-Rodriguez, M., Leskovec, J., & Krause, A. (2010). Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (pp. 10191028). ACM.Google Scholar
Gomez-Rodriguez, M., Leskovec, J., & Krause, A. (2012). Inferring networks of diffusion and influence. ACM Transactions on Knowledge Discovery from Data, 5 (4), 21:137, ACM.CrossRefGoogle Scholar
Gomez-Rodriguez, M., & Schölkopf, B. (2012a). Influence maximization in continuous time diffusion networks. In Langford, John & Pineau, Joelle (Eds.), Proceedings of the 29th International Conference on Machine Learning (pp. 313320). Omnipress.Google Scholar
Gomez-Rodriguez, M., & Schölkopf, B. (2012b). Modeling information propagation with survival theory. In Bartlett, P., Pereira, F. C. N., Burges, C. J. C., Bottou, L., & Weinberger, K. Q. (Eds.), Advances in Neural Information Processing Systems 25 (NIPS-25): Workshop in Algorithmic and Statistical Approaches for Large Social Networks. NIPS Foundation.Google Scholar
Gomez-Rodriguez, M., & Schölkopf, B. (2012c). Submodular inference of diffusion networks from multiple trees. In Langford, John & Pineau, Joelle (Eds.), Proceedings of the 29th International Conference on Machine Learning (pp. 489496). Omnipress.Google Scholar
Grant, M., & Boyd, S. (2010). CVX: Matlab software for disciplined convex programming, version 1.21. Retrieved from http://cvxr.com/cvx (2013).Google Scholar
Hufnagel, L., Brockmann, D., & Geisel, T. (2004). Forecast and control of epidemics in a globalized world. Proceedings of the National Academy of Sciences of the United States of America, 101 (42), 15124.CrossRefGoogle Scholar
InfoPath. (2013). InfoPath. Retrieved from http://snap.stanford.edu/infopath/ (2013).Google Scholar
Kaplan, E. H. (1989). What are the risks of risky sex? Modeling the AIDS epidemic. Operations Research, 37 (2), 198209.CrossRefGoogle Scholar
Kempe, D., Kleinberg, J. M. & Tardos, É. (2003). Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 137146). ACM.Google Scholar
Lappas, T., Terzi, E., Gunopulos, D., & Mannila, H. (2010). Finding effectors in social networks. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 10591068). ACM.CrossRefGoogle Scholar
Lawless, J. F. (1982). Statistical models and methods for lifetime data. New York, NY: Wiley.Google Scholar
Leskovec, J., Adamic, L. A., & Huberman, B. A. (2006). The dynamics of viral marketing. In Proceedings of the 7th ACM Conference on Electronic Commerce (pp. 228237). ACM.CrossRefGoogle Scholar
Leskovec, J., Backstrom, L., & Kleinberg, J. (2009). Meme-tracking and the dynamics of the news cycle. In Proceedings of the 15th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining (pp. 497506). New York, NY: ACM.CrossRefGoogle Scholar
Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C., & Ghahramani, Z. (2010). Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research, 11, 9851042.Google Scholar
Leskovec, J., Krause, A., Guestrin, C., Faloutsos, C., VanBriesen, J., & Glance, N. (2007b). Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM Sigkdd International Conference on Knowledge Discovery and Data Mining (pp. 420–429).CrossRefGoogle Scholar
Leskovec, J., Lang, K. J., Dasgupta, A., & Mahoney, M. W. (2008). Statistical properties of community structure in large social and information networks. In Proceedings of the 17th International Conference on World Wide Web (pp. 695704). ACM.CrossRefGoogle Scholar
Leskovec, J., McGlohon, M., Faloutsos, C., Glance, N., & Hurst, M. (2007a). Cascading behavior in large blog graphs. In Proceedings of the SIAM Conference on Data Mining (pp. 551556). SIAM.Google Scholar
Liben-Nowell, D., & Kleinberg, J. (2008). Tracing the flow of information on a global scale using Internet chain-letter data. Proceedings of the National Academy of Sciences, 105 (12), 46334638.CrossRefGoogle ScholarPubMed
Lipsitch, M., Cohen, T., Cooper, B., Robins, J. M., Ma, S., James, L., . . . Murray, M. (2003). Transmission dynamics and control of severe acute respiratory syndrome. Science, 300 (5627), 1966.CrossRefGoogle ScholarPubMed
Merriam-Webster's collegiate dictionary. 2004. Springfield, MA: Merriam-Webster.Google Scholar
Miller, M., Sathi, C., Wiesenthal, D., Leskovec, J., & Potts, C. (2011). Sentiment flow through hyperlink networks. In Proceedings of the 5th International AAAI Conference on Weblogs and Social Media. AAAI.Google Scholar
Myers, S., & Leskovec, J. (2010). On the convexity of latent social network inference. In Lafferty, J., Williams, C. K. I., Shawe-Taylor, J., Zemel, R. S., & Culotta, A. (Eds.), Advances in Neural Information Processing Systems 23 (NIPS-23) (pp. 17411749). NIPS Foundation.Google Scholar
Myers, S., & Leskovec, J. (2012). Clash of the contagions: Cooperation and competition in information diffusion. Proceedings of the IEEE International Conference on Data Mining (pp. 539548). IEEE.Google Scholar
Myers, S., Leskovec, J., & Zhu, C. (2012). Information diffusion and external influence in networks. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 3341). ACM.Google Scholar
Nemirovski, A., Juditsky, A., Lan, G., & Shapiro, A. (2009). Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19 (4), 1574.CrossRefGoogle Scholar
Netrapalli, P., & Sanghavi, S. (2012). Finding the graph of epidemic cascades. In Proceedings of the 12th ACM SIGMETRICS/Performance. PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (pp. 211222). ACM.Google Scholar
NetRate. (2011). NetRate. Retrieved from http://people.tue.mpg.de/manuelgr/netrate/ (2013).Google Scholar
Newey, W. K., & McFadden, D. L. (1994). Large sample estimation and hypothesis testing. In Engle, R. F. & McFadden, D. L. (Eds.), Handbook of econometrics, Vol. IV (pp. 21112245). Amsterdam, Netherlands: Elsevier Science B.V.Google Scholar
Prakash, B. A., Beutel, A., Rosenfeld, R., & Faloutsos, C. (2012). Winner takes all: Competing viruses or ideas on fair-play networks. In Proceedings of the 21st International Conference on World Wide Web (pp. 10371046). ACM.CrossRefGoogle Scholar
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22 (3), 400407.CrossRefGoogle Scholar
Rogers, E. M. (1995). Diffusion of innovations (4th ed.). New York, NY: Free Press.Google ScholarPubMed
Romero, D. M., Meeder, B., & Kleinberg, J. (2011). Differences in the mechanics of information diffusion across topics: Idioms, political hashtags, and complex contagion on twitter. In Proceedings of the 20th International Conference on World Wide Web (pp. 695704). ACM.CrossRefGoogle Scholar
Roux, N. L., Schmidt, M., & Bach, F. (2012). A stochastic gradient method with an exponential convergence rate for strongly-convex optimization with finite training sets. In Bartlett, P., Pereira, F. C. N., Burges, C. J. C., Bottou, L., & Weinberger, K. Q. (Eds.), Advances in Neural Information Processing Systems 25 (NIPS-25) (pp. 26722680). NIPS Foundation.Google Scholar
Sadikov, S., Medina, M., Leskovec, J., & Garcia-Molina, H. (2011). Correcting for missing data in information cascades. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (pp. 5564). ACM.CrossRefGoogle Scholar
SNAP. (2012). SNAP: Stanford network analysis platform. Retrieved from http://snap.stanford.edu (2013).Google Scholar
Snowsill, T. M., Fyson, N., De Bie, T., & Cristianini, N. (2011). Refining causality: Who copied from whom? In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 466474). ACM.CrossRefGoogle Scholar
Vu, D. Q., Asuncion, A. U., Hunter, D. R., & Smyth, P. (2011). Continuous-time regression models for longitudinal networks. In Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., & Weinberger, K. (Eds.), Advances in Neural Information Processing Systems 24 (NIPS-24) (pp. 24922500). NIPS Foundation.Google Scholar
Wallinga, J., & Teunis, P. (2004). Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. American Journal of Epidemiology, 160 (6), 509516.CrossRefGoogle ScholarPubMed
Wang, C., Knight, J. C., & Elder, M. C. (2000). On computer viral infection and the effect of immunization. In Proceedings of the 16th Annual Conference on Computer Security Applications (pp. 246256). IEEE Computer Society.Google Scholar
Watts, D. J., & Dodds, P. S. (2007). Influentials, networks, and public opinion formation. Journal of Consumer Research, 34 (4), 441458.CrossRefGoogle Scholar