Policymakers are implementing reforms with the assumption that students do better when attending high-achieving schools. In this article, we use longitudinal data from Chicago Public Schools to test that assumption. We find that the effects of attending a higher performing school depend on the school’s performance level. At elite public schools with admission criteria, there are no academic benefits—test scores are not better, grades are lower—but students report better environments. In contrast, forgoing a very low-performing school for a nonselective school with high test scores and graduation rates improves a range of academic and nonacademic outcomes.
Research has focused predominantly on how teachers affect students’ achievement on standardized tests despite evidence that a broad range of attitudes and behaviors are equally important to their long-term success. We find that upper-elementary teachers have large effects on self-reported measures of students’ self-efficacy in math, and happiness and behavior in class. Students’ attitudes and behaviors are predicted by teaching practices most proximal to these measures, including teachers’ emotional support and classroom organization. However, teachers who are effective at improving test scores often are not equally effective at improving students’ attitudes and behaviors. These findings lend empirical evidence to well-established theory on the multidimensional nature of teaching and the need to identify strategies for improving the full range of teachers’ skills.
A key question facing teacher evaluation systems is how to combine multiple measures of complex constructs into composite indicators of performance. We use data from the Measures of Effective Teaching (MET) study to investigate the measurement properties of composite indicators obtained under various conjunctive, disjunctive (or complementary), and weighted (or compensatory) models. We find that accuracy varies across models and cut-scores and that models with similar accuracy may yield different teacher classifications. Accuracy and consistency are greatest if composites are constructed to maximize reliability and lowest if they seek to optimally predict student test scores. We discuss the implications of the results for the validity of inferences about the performance of individual teachers, and more generally for the design of teacher evaluation systems.
Across the United States, students who are deemed not to be proficient in English are classified as English learners (ELs). This classification entitles students to specialized services but may also result in stigmatization and barriers to educational opportunity. This article uses a regression discontinuity design to estimate the effect of EL classification in kindergarten on students’ academic trajectories. Furthermore, it explores whether the effect of EL classification differs for students in English immersion versus bilingual programs. I find that among language-minority students who enter kindergarten with relatively advanced English proficiency, EL classification results in a substantial negative net impact on math and English language arts test scores in Grades 2 through 10. This effect, however, is concentrated in English immersion classrooms.
Teacher turnover is a challenge for U.S. public schools. Research suggests that teachers’ perceptions of their school working conditions influence their leaving decisions. Related research suggests that principals may be in the best position to influence school working conditions. Using 4 years of panel data constructed from the North Carolina Teacher Working Condition Survey, this study uses value-added modeling approaches to explore the relationship between teachers’ perceptions of four measures of their working conditions and their principal. It finds that teacher ratings of the school environment depend on which principal is leading the school, independent of other school and district contextual factors, suggesting districts struggling with teacher turnover should assess climate and use that information to advise and support principals.
Subtle policy adjustments can induce relatively large "ripple effects." We evaluate a College Board initiative that increased the number of free SAT score reports available to low-income students and changed the time horizon for using these score reports. Using a difference-in-differences analytic strategy, we estimate that targeted students were roughly 10 percentage points more likely to send eight or more reports. The policy improved on-time college attendance and 6-year bachelor’s completion by about 2 percentage points. Impacts were realized primarily by students who were competitive candidates for 4-year college admission. The bachelor’s completion impacts are larger than would be expected based on the number of students driven by the policy change to enroll in college and to shift into more selective colleges. The unexplained portion of the completion effects may result from improvements in nonacademic fit between students and the postsecondary institutions in which they enroll.
Student peer effects are well documented; however, we know far less about peer effects among teachers. We hypothesize that a relatively effective teacher can positively affect the performance of his or her peers, whereas a relatively ineffective teacher may negatively affect the performance of other teachers with whom he or she works closely. Utilizing a decade of data on teacher transfers between schools that result in changes of peers when transfer teachers enter grade-level team in the new school, we find evidence of strong positive spillover effects associated with the introduction of peers who are more effective than the incumbent teacher himself or herself. However, the incumbent teacher’s students are not meaningfully disadvantaged by the entry of relatively ineffective peers. This finding provides initial evidence that mixing teachers with diverse performance levels can increase student achievement in the aggregate. These results are robust to several student sorting and teacher selection issues.
In practice, teacher turnover appears to have negative effects on school quality as measured by student performance. However, some simulations suggest that turnover can instead have large positive effects under a policy regime in which low-performing teachers can be accurately identified and replaced with more effective teachers. This study examines this question by evaluating the effects of teacher turnover on student achievement under IMPACT, the unique performance-assessment and incentive system in the District of Columbia Public Schools (DCPS). Employing a quasi-experimental design based on data from the first years of IMPACT, we find that, on average, DCPS replaced teachers who left with teachers who increased student achievement by 0.08 standard deviation (SD) in math. When we isolate the effects of lower-performing teachers who were induced to leave DCPS for poor performance, we find that student achievement improves by larger and statistically significant amounts (i.e., 0.14 SD in reading and 0.21 SD in math). In contrast, the effect of exits by teachers not sanctioned under IMPACT is typically negative but not statistically significant.
Prior research suggests that summer learning loss among low-income children contributes to income-based gaps in achievement and educational attainment. We present results from a randomized experiment of a summer mathematics program conducted in a large, high-poverty urban public school district. Children in the third to ninth grade (N = 263) were randomly assigned to an offer of an online summer mathematics program, the same program plus a free laptop computer, or the control group. Being randomly assigned to the program plus laptop condition caused children to experience significantly higher reported levels of summer home mathematics engagement relative to their peers in the control group. Treatment and control children performed similarly on distal measures of academic achievement. We discuss implications for future research.
Across the country and in urban areas in particular, many students change schools during the academic year. While much research documents the impact of changing schools on the academic achievement of mobile students themselves, less research explores whether new arrivals have negative spillovers on stable classmates. The lack of research on impacts of mid-year entry is problematic, as poor, minority, and low-achieving students are disproportionately exposed to mid-year entry. In this study, we use a rigorous causal identification strategy and rich longitudinal data on fourth- through eighth-grade students in the New York City (NYC) public schools to estimate the impact of exposure to mid-year entry on the achievement of stable students. We analyze heterogeneous effects of mid-year entrants by origin (arriving from other NYC public schools, from other U.S. school systems, or from other countries), determine the extent to which mid-year entrants’ characteristics mediate the impact of mid-year entry, and explore the moderating influence of stable students’ characteristics. We find small negative effects of mid-year entry on both math and English language arts test scores in the short run. These impacts are not driven by mid-year entrant characteristics and are somewhat larger for Asian students and those who do not qualify for free or reduced-price lunch. Finally, results suggest mid-year entry continues to negatively influence the math performance of stable students beyond the year of exposure. We discuss the relevance of results and conclude with recommendations for future research.
Educators raise concerns about what happens to students when they are exposed to new or new-to-school teachers. However, even when teachers remain in the same school they can switch roles by moving grades and/or subjects. We use panel data from New York City to compare four ways in which teachers are new to assignment: new to teaching, new to district, new to school, or new to subject/grade. We find negative effects of having a churning teacher of about one third the magnitude of the effect of a new teacher. However the average student is assigned to churning teachers four times more often than to new teachers, and historically underserved students are slightly more likely to be assigned to churning teachers.
A growing number of states experimented with alternative governance structures in response to pressure to raise student achievement. Post-Katrina experimentation in New Orleans was widely regarded as a model example of new governance reforms and provided a unique opportunity to learn about the variation in student achievement and behavior within and between school sectors and school types. Our results indicated many of the sector and school type combinations that produced higher math and English Language Arts achievement also positively impacted students’ behavior, suggesting that the achievement results were not merely driven by teaching to the test. Finally, our results suggested in a low-performing district, schools may benefit from the collaborative opportunities of belonging to a local school district or network of schools.
Many college students never take, or do not pass, required remedial mathematics courses theorized to increase college-level performance. Some colleges and states are therefore instituting policies allowing students to take college-level courses without first taking remedial courses. However, no experiments have compared the effectiveness of these approaches, and other data are mixed. We randomly assigned 907 students to (a) remedial elementary algebra, (b) that course with workshops, or (c) college-level statistics with workshops (corequisite remediation). Students assigned to statistics passed at a rate 16 percentage points higher than those assigned to algebra (p < .001), and subsequently accumulated more credits. A majority of enrolled statistics students passed. Policies allowing students to take college-level instead of remedial quantitative courses can increase student success.
Most of the students who set out to earn degrees in community colleges never do. Interventions that simplify the complex organizational structures of these schools are promising solutions to this problem. This article is the first to provide rigorous evidence of the effects of structured transfer programs, one such intervention. Leveraging the phased rollout of transfer programs in California, I find large effects of the policy on degrees earned in treated departments. In the first 2 years, this growth was not coupled with growth in total degrees granted or in transfers, but in the third year, there is evidence of increased transfer. The analyses also show that the policy could affect equity; departments that offer transfer degrees became more popular and there is suggestive evidence that the highest achieving student groups enrolled in these classes at higher rates.
Despite growing calls for more accountability of teacher education programs (TEPs), there is little consensus about how to evaluate them. This study investigates the potential for using observational ratings of program completers to evaluate TEPs. Drawing on statewide data on almost 9,500 program completers, representing 44 providers (183 programs) in Tennessee across 3 years, we investigate multiple models to estimate TEP quality. Results suggest that using observational ratings to evaluate TEPs has promise. We were able to detect significant and meaningful differences between TEPs, which were fairly robust across modeling approaches. Moreover, TEP rankings based on observational ratings were positively and significantly related to rankings based on student achievement gains.
Zero tolerance discipline policies have come under criticism as contributors to racial discipline gaps; however, few studies have explicitly examined such policies. This study utilizes data from two nationally representative data sources to examine the effect of state zero tolerance laws on suspension rates and principal perceptions of problem behaviors. Utilizing state and year fixed effects models, this study finds that state zero tolerance laws are predictive of a 0.5 percentage point increase in district suspension rates and no consistent decreases in principals’ perceptions of problem behaviors. Furthermore, the results indicate that the laws are predictive of larger increases in suspension rates for Blacks than Whites, potentially contributing to the Black–White suspension gap. Implications for policy and practice are discussed.
One consequence of the Great Recession is that teacher layoffs occurred at a scale previously unseen. In this article, we assess the effects of receiving a layoff notice on teacher mobility using data from Los Angeles and Washington State. Our analyses are based on 6-year panels of data in each site, including 4 years of layoffs. We find that the layoff process leads far more teachers to leave their schools for other district schools than is necessary to reach budget savings targets. In other words, the layoff process induces teacher churn, impacting even teachers who are not actually laid off. Placebo tests confirm that this "structural churn" results from the layoff process rather than from differential mobility of targeted teachers.
School reconstitution, a turnaround strategy that prescribes massive staffing turnover, is expected to result in more committed and capable school staff and innovative practices. However, little evidence supports this assumption. We use quasi-experimental designs to assess the impact of reconstitution on student achievement and teacher mobility, finding that reconstitution affected teacher mobility and improved student achievement in the first year of the reform, with continued but smaller impacts in the out years. We draw on mutual learning theory to conduct an exploratory analysis of reform implementation. We find that initial re-staffing and strategic planning may have promoted balance between exploring new and exploiting existing knowledge. Over time, however, balanced, mutual learning was not sustained.
Retaining effective teachers is a key policy priority nationwide, particularly in districts that serve large numbers of disadvantaged students. We investigate whether a change in the Miami region’s Teach for America (TFA) placement strategy was accompanied by changes in teacher attrition and mobility decisions. Our results suggest that the increased concentration of TFA corps members in schools was associated with a reduction in TFA mobility across schools after the first year of service, but showed no association with the overall retention of corps members in the district after the 2-year commitment. We also find that TFA corps members eventually retained beyond the 2-year commitment performed substantially better in mathematics during their first 2 years of teaching: evidence of positive selection into postcommitment retention.
This article presents the findings of an evaluation of the eMINTS (enhancing Missouri’s Instructional Networked Teaching Strategies) professional development program. eMINTS is an intensive teacher professional development program designed to promote inquiry-based learning, support high-quality lesson design, build community among students and teachers, and create technology-rich learning environments. This evaluation included 60 high-poverty rural schools across Missouri that were randomly assigned to two treatment conditions and a control condition, with approximately 200 teachers and 3,000 students in the 2011–2012 baseline academic year. The researchers conclude that after 3 years, the eMINTS treatment group and an eMINTS treatment group with an additional year of Intel support resulted in changed teacher instructional behaviors and increased student achievement in mathematics.
One way in which financial aid is thought to promote college success is by minimizing the time students spend working. Yet, little research has examined if this intended first-order effect occurs, and results are mixed. We leverage a randomized experiment and find that students from low-income families in Wisconsin offered additional grant aid were 5.88 percentage points less likely to work and worked 1.69 fewer hours per week than similar peers, an 8.56% and 14.35% reduction, respectively. Students offered the grant also improved qualitative aspects of their work experiences; they were less likely to work extensively, during the morning hours, or overnight. Grant aid thus appears to partially offset student employment, possibly improving prospects for academic achievement and attainment.
Recent years have seen the convergence of two major policy streams in U.S. K–12 education: standards/accountability and teacher quality reforms. Work in these areas has led to the creation of multiple measures of teacher quality, including measures of their instructional alignment to standards/assessments, observational and student survey measures of pedagogical quality, and measures of teachers’ contributions to student test scores. This article is the first to explore the extent to which teachers’ instructional alignment is associated with their contributions to student learning and their effectiveness on new composite evaluation measures using data from the Bill & Melinda Gates Foundation’s Measures of Effective Teaching study. Finding surprisingly weak associations, we discuss potential research and policy implications for both streams of policy.
This study sought to understand the opportunities and challenges associated with the implementation of state designed Race to the Top (RttT) funded reform networks. Drawing on a conceptual framework developed from the networked governance literature, we analyzed the 12 state RttT grantees’ applications. Our analysis revealed that states designed large implementation networks with potential to bring a wide range of resources to bear on reform efforts, particularly through participation of numerous nonsystem actors. However, coordinating large and diverse networks places state education agencies (SEAs) in a new and challenging role. The extent to which networks extend state capacity to support educational improvement or further complicate the work of SEAs remains an open question. We propose a model including a set of theoretical propositions to guide future research.
This article evaluates the impact and cost-effectiveness of offering an innovative middle school model—the Sistema de Aprendizaje Tutorial (SAT)—to Honduran villages instead of traditional middle schools. We identified a matched sample of villages with either type of school and collected baseline data among primary school graduates eligible to enroll in middle schools. After 2 years, the test scores of children residing in SAT villages were 0.2 standard deviations higher than children in other villages, though the per-student cost in SATs was at least 10% lower than traditional schools. The article is one of the few studies to rigorously evaluate a scaled-up instructional reform in a poor country, implemented with an alternative model of teacher recruitment and contracting.
Research on early compulsory schooling laws finds minimal effects on attendance but fails to investigate heterogeneous effects. Similarly, research proposes limited contexts in which expansion policies can increase equality but has difficulty separating policy and cohort effects. Capitalizing on within-country variation in timing of early compulsory laws, passed 1852 to 1918, I ask whether they improved equality of school attendance or educational attainment by class, nativity, and race. Based on census data, compulsory laws increased equality of attendance and attainment, particularly among young men in the North, where the laws reduced class and race gaps by over 20%. Early compulsory schooling laws provided "hidden gains," missed in previous analyses, suggesting policies that raise minimum schooling can increase educational equality in certain contexts.
It is well established that students who begin post-secondary education at a community college are less likely to earn a bachelor’s degree than otherwise similar undergraduates who begin at a 4-year school, but there is less consensus over the mechanisms generating this disparity. We explore these using national longitudinal transcript data and propensity-score methods. Inferior academic preparation does not seem to be the main culprit: We find few differences between students’ academic progress at each type of institution during the first 2 years of college and (contrary to some earlier scholarship) students who do transfer have BA graduation rates equal to similar students who begin at 4-year colleges. However, after 2 years, credit accumulation diverges in the two kinds of institutions, due in part to community college students’ greater involvement in employment, and a higher likelihood of stopping out of college, after controlling for their academic performance. Contrary to some earlier claims, we find that a vocational emphasis in community college is not a major factor behind the disparity. One important mechanism is the widespread loss of credits that occurs after undergraduates transfer from a community college to a 4-year institution; the greater the loss, the lower the chances of completing a BA. However, earlier claims that community college students receive lower aid levels after transfer and that transfers disproportionately fail to survive through the senior year are not supported by our analyses.
The effectiveness of educator incentive programs rests on the assumption that the potential rewards for participants will motivate them to behave in certain ways (e.g., choose certain jobs, expend greater effort, engage in capacity-building professional development). Some researchers have examined the impact of financial incentives on teacher behaviors and work conditions, but few look inside schools to examine how current teachers interpret their rewards, how payouts affect teachers’ willingness to participate in these programs, and how incentives might be structured to motivate teachers. This mixed-methods study of one Teacher Incentive Fund–supported program addresses that gap. We draw on expectancy and goal setting theories to analyze teachers’ reactions to financial rewards and how their reactions may shape the motivational potency of the incentives.
Expansion of the use of student test score data to measure teacher performance has fueled recent policy interest in using those data to measure the effects of school administrators as well. However, little research has considered the capacity of student performance data to uncover principal effects. Filling this gap, this article identifies multiple conceptual approaches for capturing the contributions of principals to student test score growth, develops empirical models to reflect these approaches, examines the properties of these models, and compares the results of the models empirically using data from a large urban school district. The article then assesses the degree to which the estimates from each model are consistent with measures of principal performance that come from sources other than student test scores, such as school district evaluations. The results show that choice of model is substantively important for assessment. While some models identify principal effects as large as 0.18 standard deviations in math and 0.12 in reading, others find effects as low as 0.0.05 (math) or 0.03 (reading) for the same principals. We also find that the most conceptually unappealing models, which over-attribute school effects to principals, align more closely with nontest measures than do approaches that more convincingly separate the effect of the principal from the effects of other school inputs.
We use a difference-in-differences analytic approach to estimate postsecondary consequences from Maine’s mandate that all public school juniors take the SAT®. We find that, overall, the policy increased 4-year college-going rates by 2- to 3-percentage points and that 4-year college-going rates among induced students increased by 10-percentage points.
Although wide variation in teacher effectiveness is well established, much less is known about differences in teacher improvement over time. We document that average returns to teaching experience mask large variation across individual teachers and across groups of teachers working in different schools. We examine the role of school context in explaining these differences using a measure of the professional environment constructed from teachers responses to state-wide surveys. Our analyses show that teachers working in more supportive professional environments improve their effectiveness more over time than teachers working in less supportive contexts. On average, teachers working in schools at the 75th percentile of professional environment ratings improved 38% more than teachers in schools at the 25th percentile after 10 years.
Remediation is one of the largest single interventions intended to improve outcomes for underprepared college students, yet little is known about the remedial screening process. Using administrative data and a rich predictive model, we find that severe mis-assignments are common using current test-score-cutoff-based policies, with "underplacement" in remediation much more common than "overplacement" college courses. Incorporating high school transcripts into the process could significantly reduce placement errors, but adding test scores to already available high school data often provides little marginal benefit. Moreover, the choice of screening policy has significant implications for the racial and gender composition of college-level courses. Finally, the use of more accurate screening tools would enable institutions to remediate substantially fewer students without compromising college success.
Community colleges are under pressure to improve completion rates and efficiency despite limited economic evidence on how to do so and the consequences of different reform strategies. Here, we set out an economic model of student course pathways linked to college expenditures and revenues. Using detailed data from a single college, we calculate baseline efficiency and differences in efficiency for students who follow different pathways. We simulate changes in output, expenditures, revenues, net revenues, and efficiency assuming that the college meets performance targets. We find substantial differences in efficiency across pathways and significant differences in efficiency across strategies to help students complete college. The model has wide practical application for community colleges.
There is a comprehensive literature documenting how colleges’ tuition, financial aid packages, and academic reputations influence students’ application and enrollment decisions. Far less is known about how quality-of-life reputations and peer institutions’ reputations affect these decisions. This article investigates these issues using data from two prominent college guidebook series to measure changes in reputations. We use information published annually by the Princeton Review—the best-selling college guidebook that formally categorizes colleges based on both academic and quality-of-life indicators—and the U.S. News and World Report—the most famous rankings of U.S. undergraduate programs. Our findings suggest that changes in academic and quality-of-life reputations affect the number of applications received by a college and the academic competitiveness and geographic diversity of the ensuing incoming freshman class. Colleges receive fewer applications when peer universities earn high academic ratings. However, unfavorable quality-of-life ratings for peers are followed by decreases in the college’s own application pool and the academic competitiveness of its incoming class. This suggests that potential applicants often begin their search process by shopping for groups of colleges where non-pecuniary benefits may be relatively high.
One of the enduring problems in education is the persistence of achievement gaps between White, wealthy, native English-speaking students and their counterparts who are minority, lower-income, or English language learners. This study shows that one intensive technical assistance (TA) intervention—California’s District Assistance and Intervention Teams (DAITs)—implemented in conjunction with a high-stakes accountability policy improves the math and English performance of traditionally underserved students. Using a 6-year panel of student-level data from California, we find that the DAIT intervention significantly reduces achievement gaps between Black, Hispanic, and poor students and their White and wealthier peers. These results indicate that capacity-building TA helps to close achievement gaps in California’s lowest performing districts.
In this article, we perform cost-effectiveness analysis on interventions that improve the rate of high school completion. Using the What Works Clearinghouse to select effective interventions, we calculate cost-effectiveness ratios for five youth interventions. We document wide variation in cost-effectiveness ratios between programs and between sites within multisite programs, reflecting differences in resource use, program implementation, and target population characteristics. We offer suggestions as to how cost-effectiveness data can be used to inform policymaking, with the goal of improving the efficiency with which public and private resources are employed in education.
Measures of classroom and school environments are central to policy efforts that assess school and teacher quality. These measures are often formed by aggregating individual survey responses to form group-level measures, and assume an invariant measurement model holds across the individual and group levels. This article explores the tenability of this assumption by applying multilevel factor analysis to two well-known surveys: the Working Conditions Survey, which assesses school environments, and the Tripod Classroom Environment Survey. The examples illustrate the consequences of using common factor analytic methods that assume cross-level invariance, and demonstrate how distorted perceptions of factorial structure can influence inferences about the relationship between working conditions and teacher mobility.
We contribute to debate about causal inferences in educational research in two ways. First, we quantify how much bias there must be in an estimate to invalidate an inference. Second, we utilize Rubin’s causal model to interpret the bias necessary to invalidate an inference in terms of sample replacement. We apply our analysis to an inference of a positive effect of Open Court Curriculum on reading achievement from a randomized experiment, and an inference of a negative effect of kindergarten retention on reading achievement from an observational study. We consider details of our framework, and then discuss how our approach informs judgment of inference relative to study design. We conclude with implications for scientific discourse.
Many English Language Learners (ELLs) migrate to the United States at older ages and administrators must choose a grade in which to place these new entrants as soon as they register for school. In this study, I estimate the effect of grade placement on the short-term academic performance of ELLs who enroll in the Miami-Dade County Public School system between the ages of 7 and 12 using a district policy that determines grade placement decisions for newcomers by their birthdate relative to September 1. The results suggest some benefits to being placed in the lower of the two grades for students’ achievement in mathematics, but no signs of benefits on other academic outcomes, including reading achievement, grade promotion, and exit from ELL status.
It is often difficult and costly to obtain individual-level student achievement data, yet, researchers are frequently reluctant to use school-level achievement data that are widely available from state websites. We argue that public-use aggregate school-level achievement data are, in fact, sufficient to address a wide range of evaluation questions and the use of this data is more appropriate than commonly thought. Specifically, we explore (a) when point estimates and standard errors differ between models that use individual student-level data and those that use aggregate school-level data, (b) the potential for conducting subgroup and nonexperimental analyses with aggregate data, and (c) the metrics that are currently available in state public-use data sets and the implications these have for analyses.