SMS spam filtering and thread identification using bi-level text classification and clustering techniques
Journal of Information Science
Published online on December 03, 2015
Abstract
SMS spam detection is an important task where spam SMS messages are identified and filtered. As greater numbers of SMS messages are communicated every day, it is very difficult for a user to remember and correlate the newer SMS messages received in context to previously received SMS. SMS threads provide a solution to this problem. In this work the problem of SMS spam detection and thread identification is discussed and a state of the art clustering-based algorithm is presented. The work is planned in two stages. In the first stage the binary classification technique is applied to categorize SMS messages into two categories namely, spam and non-spam SMS; then, in the second stage, SMS clusters are created for non-spam SMS messages using non-negative matrix factorization and K-means clustering techniques. A threading-based similarity feature, that is, time between consecutive communications, is described for the identification of SMS threads, and the impact of the time threshold in thread identification is also analysed experimentally. Performance parameters like accuracy, precision, recall and F-measure are also evaluated. The SMS threads identified in this proposed work can be used in applications like SMS thread summarization, SMS folder classification and other SMS management-related tasks.