“Eyes” on the Street: Using Computer Vision to Get the Gist of How Environmental Design Shapes Crime Across Neighborhoods

Published online on May 11, 2026

Abstract

{"__content__"=>"\n Purpose\n \n \n Methods\n \n \n Results\n \n \n Conclusions\n \n ", "p"=>[{"__content__"=>"Assess the crime prevention through environmental design (CPTED) framework as a neighborhood theory and propose a new computational approach for measuring place visuals. Inspired by lessons from cognitive psychology, this approach augments object-based measures of environments (e.g., how many trees are present) with measures of visual ‘gists’ capturing broad image assessments."}, {"__content__"=>"4,800 respondents were surveyed to provide human ratings of five CPTED-inspired gist metrics (preference, complexity, memorability, transparency, and enclosure) for 8,249 Chicago Google Street View images. Gist metrics were evaluated and interpreted using monte carlo-based reliability simulations and hierarchical linear models. Using residual neural networks, we trained a series of computer vision models that could predict human-rated gist scores on new images. After validating out-of-sample performance, these models were applied to estimate CPTED gist scores on a larger set of 187,048 Chicago street-view images. Multi-level variance decomposition analysis was used to probe the nested geographic structure of gist metrics. XGBoost were used to evaluate whether gist features were correlated with crime and prosociality (as measured via voting participation rates)."}, {"__content__"=>"Human-rated gist assessments were highly reliable across participants and were correlated with both the object composition of images and demographic features of the pictured neighborhood. AI-assigned gist labels correlated highly with human-ratings, suggesting very strong out-of-sample accuracy for all gist models. Variance decomposition results suggested CPTED gists substantially vary across census tracts even when accounting for micro-spatial visual differences. XGBoost results suggested violent crime is best predicted using object-based measures, non-violent crime is best predicted using gist measures, and voting participation is best explained using both sets of image features."}, {"__content__"=>"Aggregate neighborhood visual features are important for understanding why some places experience more crime. Future research should seek to understand how broad visual gists and the presence of specific objects interact to shape behavior and decision-making among guardians, offenders, and targets."}]}