The aboutness of words
Journal of the American Society for Information Science and Technology
Published online on July 31, 2017
Abstract
Word aboutness is defined as the relationship between words and subjects associated with them. An aboutness coefficient is developed to estimate the strength of the aboutness relationship. Words that are randomly distributed across subjects are assumed to lack aboutness and the degree to which their usage deviates from a random pattern indicates the strength of the aboutness. To estimate aboutness, title words and their associated subjects are extracted from the titles of non‐fiction English language books in the OCLC WorldCat database. The usage patterns of the title words are analyzed and used to compute aboutness coefficients for each of the common title words. Words with low aboutness coefficients (An and In) are commonly found in stop word lists, whereas words with high aboutness coefficients (Carbonate, Autism) are unambiguous and have a strong subject association. The aboutness coefficient potentially can enhance indexing, advance authority control, and improve retrieval.