Statistical Power in Experimental Audit Studies: Cautions and Calculations for Matched Tests With Nominal Outcomes
Sociological Methods & Research
Published online on February 16, 2015
Abstract
Given their capacity to identify causal relationships, experimental audit studies have grown increasingly popular in the social sciences. Typically, investigators send fictitious auditors who differ by a key factor (e.g., race) to particular experimental units (e.g., employers) and then compare treatment and control groups on a dichotomous outcome (e.g., hiring). In such scenarios, an important design consideration is the power to detect a certain magnitude difference between the groups. But power calculations are not straightforward in standard matched tests for dichotomous outcomes. Given the paired nature of the data, the number of pairs in the concordant cells (when neither or both auditor receives a positive response) contributes to the power, which is lower as the sum of the discordant proportions approaches one. Because these quantities are difficult to determine a priori, researchers must exercise particular care in experimental design. We here present sample size and power calculations for McNemar’s test using empirical data from an audit study on misdemeanor arrest records and employability. We then provide formulas and examples for cases involving more than two treatments (Cochran’s Q test) and nominal outcomes (Stuart–Maxwell test). We conclude with concrete recommendations concerning power and sample size for researchers designing and presenting matched audit studies.