Objective/Study Question To estimate and compare sample average treatment effects (SATE) and population average treatment effects (PATE) of a resident duty hour policy change on patient and resident outcomes using data from the Flexibility in Duty Hour Requirements for Surgical Trainees Trial (“FIRST Trial”). Data Sources/Study Setting Secondary data from the National Surgical Quality Improvement Program and the FIRST Trial (2014–2015). Study Design The FIRST Trial was a cluster‐randomized pragmatic noninferiority trial designed to evaluate the effects of a resident work hour policy change to permit greater flexibility in scheduling on patient and resident outcomes. We estimated hierarchical logistic regression models to estimate the SATE of a policy change on outcomes within an intent‐to‐treat framework. Propensity score‐based poststratification was used to estimate PATE. Data Collection/Extraction Methods This study was a secondary analysis of previously collected data. Principal Findings Although SATE estimates suggested noninferiority of outcomes under flexible duty hour policy versus standard policy, the noninferiority of a policy change was inconclusively noninferior based on PATE estimates due to imprecision. Conclusions Propensity score‐based poststratification can be valuable tools to address trial generalizability but may yield imprecise estimates of PATE when sparse strata exist.