Article contents
An automatic approach to identify word sense changes in text media across timescales
Published online by Cambridge University Press: 16 April 2015
Abstract
In this paper, we propose an unsupervised and automated method to identify noun sense changes based on rigorous analysis of time-varying text data available in the form of millions of digitized books and millions of tweets posted per day. We construct distributional-thesauri-based networks from data at different time points and cluster each of them separately to obtain word-centric sense clusters corresponding to the different time points. Subsequently, we propose a split/join based approach to compare the sense clusters at two different time points to find if there is ‘birth’ of a new sense. The approach also helps us to find if an older sense was ‘split’ into more than one sense or a newer sense has been formed from the ‘join’ of older senses or a particular sense has undergone ‘death’. We use this completely unsupervised approach (a) within the Google books data to identify word sense differences within a media, and (b) across Google books and Twitter data to identify differences in word sense distribution across different media. We conduct a thorough evaluation of the proposed methodology both manually as well as through comparison with WordNet.
- Type
- Articles
- Information
- Natural Language Engineering , Volume 21 , Special Issue 5: Graphs in NLP , November 2015 , pp. 773 - 798
- Copyright
- Copyright © Cambridge University Press 2015
References
- 16
- Cited by