Assessing the Validity of Automated Webcrawlers as Data Collection Tools to Investigate Online Child Sexual Exploitation

Westlake, B., Bouchard, M., Frank, R.

Published online on November 26, 2015

Abstract

The distribution of child sexual exploitation (CE) material has been aided by the growth of the Internet. The graphic nature and prevalence of the material has made researching and combating difficult. Although used to study online CE distribution, automated data collection tools (e.g., webcrawlers) have yet to be shown effective at targeting only relevant data. Using CE-related image and keyword criteria, we compare networks starting from CE websites to those from similar non-CE sexuality websites and dissimilar sports websites. Our results provide evidence that (a) webcrawlers have the potential to provide valid CE data, if the appropriate criterion is selected; (b) CE distribution is still heavily image-based suggesting images as an effective criterion; (c) CE-seeded networks are more hub-based and differ from non-CE-seeded networks on several website characteristics. Recommendations for improvements to reliable criteria selection are discussed.