Identifying ISI‐indexed articles by their lexical usage: A text analysis approach

Mohammadreza Moohebat, Ram Gopal Raj, Sameem Binti Abdul Kareem, Dirk Thorleuchter

Journal of the American Society for Information Science and Technology

Published online on May 19, 2014

Abstract

This research creates an architecture for investigating the existence of probable lexical divergences between articles, categorized as Institute for Scientific Information (ISI) and non‐ISI, and consequently, if such a difference is discovered, to propose the best available classification method. Based on a collection of ISI‐ and non‐ISI‐indexed articles in the areas of business and computer science, three classification models are trained. A sensitivity analysis is applied to demonstrate the impact of words in different syntactical forms on the classification decision. The results demonstrate that the lexical domains of ISI and non‐ISI articles are distinguishable by machine learning techniques. Our findings indicate that the support vector machine identifies ISI‐indexed articles in both disciplines with higher precision than do the Naïve Bayesian and K‐Nearest Neighbors techniques.