An analysis of the TABARI coding system

Best, R. H., Carpino, C., Crescenzi, M. J. C.

Published online on July 09, 2013

Abstract

Textual Analysis by Augmented Replacement Instructions (TABARI) provides an automated method for coding large amounts of text. Using TABARI to code lead sentences of news stories, the KEDS/Penn State Event Data project has produced event data for several regions. The wide range of events and actors, TABARI’s ability to filter duplicate events and the number of events coded allow users to analyze patterns in conflict and cooperation between state and nonstate actors over time. We evaluate whether coding full stories provides more detailed information on the actors referenced in the lead sentences. Additional actor information would allow researchers interested in the interactions between violent nonstate actors to test hypotheses regarding group cohesiveness and splintering, spoiling behavior, commitment problems between factions and many other issues critical to management of an insurgency. We downloaded Reuters news stories relevant to the Israeli–Palestinian conflict and used TABARI to code the lead sentences. We then analyzed the full text of the coded stories to determine the level of actor detail available. Our findings highlight the dynamic relationship among nonstate and state actors during the Israeli–Palestinian conflict, and we find that, contrary to expectations, hand coding full news stories does not lead to significant improvements in the accuracy or depth of actor information compared with machine coding by TABARI using lead sentences. These findings should bolster the confidence of researchers using TABARI coded data, with the caveat that TABARI’s ability to distinguish between actors is dependent upon the detail available in the actor dictionaries.