Fast prediction of web user browsing behaviours using most interesting patterns
Journal of Information Science
Published online on November 01, 2016
Abstract
The prediction of users’ browsing behaviours is essential for putting appropriate information on the web. The browsing behaviours are stored as navigational patterns in web server logs. These weblogs are used to predict the frequently accessed patterns of web users, which can be used to predict user behaviour and to collect business intelligence. However, owing to the exponentially increasing weblog size, existing implementations of frequent-pattern-mining algorithms often take too much time and generate too many redundant patterns. This article introduces the most interesting pattern-based parallel FP-growth (MIP-PFP) algorithm. MIP-PFP is an improved implementation of the parallel FP-growth algorithm and implemented on the Apache Spark platform for extracting frequent patterns from huge weblogs. Experiments were performed on openly available National Aeronautics and Space Administration (NASA) weblog data to test the effectiveness of the MIP-PFP algorithm. The results were compared with existing implementation of PFP algorithms. The results suggest that the MIP-PFP algorithm running on Apache Spark reduced the execution time by a factor of more than 10 times. The effect of sequence length that has been used as input to the MIP-PFP algorithm was also evaluated with different interestingness parameters including support, confidence, lift, leverage, cosine, and conviction. It is observed from experimental results that only sequences of length greater than three produced a very low value of support for these interestingness measures.