Assessing the Veracity of Methods for Extracting Place Semantics from Flickr Tags
Published online on May 28, 2013
Abstract
The volume and potential value of user generated content (UGC) is ever growing. Multiply sourced, its value is greatly increased by the inclusion of metadata that adequately and accurately describes that content – particularly if such data are to be integrated with more formal data sets. Typically, digital photography is tagged with location and attribute information that variously describe the location, events or objects in the image. Often inconsistent and incomplete, these attributes reflect concepts at a range of geographic scales. From a spatial data integration perspective, the information relating to “place” is of primary interest. The challenge therefore is in selecting the most appropriate tags that best describe the geography of the image. This article presents a methodology based on an information retrieval technique that separates out “place related tags” from the remainder of the tags. Different scales of geography are identified by varying the size of the sampling area within which the imagery falls. This is applied in the context of urban environments, using Flickr imagery. Empirical analysis is then used to assess the correctness of the chosen tags (i.e. whether the tag correctly describes the geographic region in which the image was taken). Logistic regression and Bayesian inference are used to attach a probability value to each place tag. The high correlation values achieved indicate that this methodology can be used to automatically select place tags for any urban region and thus hierarchically structure UGC in order that it can be semantically integrated with other data sources.