The issue of translatability is pressing in international evaluation, in global transfer of evaluative instruments, in comparative performance management, and in culturally responsive evaluation. Terms that are never fully understood, digested, or accepted may continue to influence issues, problems, and social interactions in and around and after evaluations. Their meanings can be imposed or reinvented. Untranslatable terms are not just "lost in translation" but may produce overflows that do not go away. The purpose of this article is to increase attention to the issue of translatability in evaluation by means of specific exemplars. We provide a short dictionary of such exemplars delivered by evaluators, consultants, and teachers who work across a variety of contexts. We conclude with a few recommendations: highlight frictions in translatability by deliberately circulating and discussing words of relevance that appear to be "foreign"; increase the language skills of evaluators; and make research on frictions in translation an articulate part of the agenda for research on evaluation.
This article describes the development and use of a rapid evaluation approach to meet program accountability and learning requirements in a research for development program operating in five developing countries. The method identifies clusters of outcomes, both expected and unexpected, happening within areas of change. In a workshop, change agents describe the causal connections within outcome clusters to identify outcome trajectories for subsequent verification. Comparing verified outcome trajectories with existing program theory allows program staff to question underlying causal premises and adapt accordingly. The method can be used for one-off evaluations that seek to understand whether, how, and why program interventions are working. Repeated cycles of outcome evidencing can build a case for program contribution over time that can be evaluated as part of any future impact assessment of the program or parts of it.
Educational interventions are complex: Often they combine a diagnostic component (identifying student need) with a service component (ensuring appropriate educational resources are provided). This complexity raises challenges for program evaluation. These interventions, which we refer to as service mediation interventions, affect additional resources students receive that mediate the impact measured. Evaluations of these types of programs that solely report effects are potentially misleading. Cost-effectiveness analysis clarifies the importance of assessing service-mediated receipt for evaluation purposes. We illustrate our argument empirically from City Connects, a comprehensive student support intervention. We find that the direct costs of the program represent only one-third of the total change in resource use by program participants required to produce impacts. Evaluative statements of service mediation interventions should be accompanied by information on the full costs to achieve effects. Many interventions might be structured in this way and require evaluation that includes an economic perspective.
The search for necessary and sufficient causes of some outcome of interest, referred to as configurational comparative research, has long been one of the main preoccupations of evaluation scholars and practitioners. However, only the last three decades have witnessed the evolution of a set of formal methods that are sufficiently elaborate for this purpose. In this article, I provide a hands-on tutorial for qualitative comparative analysis (QCA)—currently the most popular configurational comparative method. In drawing on a recent evaluation of patient follow-through effectiveness in Lynch syndrome tumor-screening programs, I explain the search target of QCA, introduce its core concepts, guide readers through the procedural protocol of this method, and alert them to mistakes frequently made in QCA’s use. An annotated replication file for the QCApro extension package for R accompanies this tutorial.
Researchers have conducted numerous empirical studies on evaluation capacity (EC) and evaluation capacity building (ECB) in Western cultural settings. However, little is known about these practices in non-Western contexts. To that end, this study identified the major dimensions of EC and feasible ECB approaches in Taiwanese elementary and junior high schools. Using a Delphi technique with 23 experts, the research sought consensus on the components of EC organized in three categories (evaluation culture, evaluation infrastructure, and human resources) and on approaches to building it in Taiwanese schools. The study also identified school-driven and government-driven approaches to ECB in this context. Although the findings support the major dimensions and approaches identified in the Western literature, unique differences emerged in the Taiwanese context. The article concludes with implications for theory and practice.
Evaluating initiatives implemented across multiple settings can elucidate how various contextual factors may influence both implementation and outcomes. Understanding context is especially critical when the same program has varying levels of success across settings. We present a framework for evaluating contextual factors affecting an initiative at multiple phases of its life cycle, including design, implementation, scale-up, spread, and sustainability. After providing a brief overview of related frameworks from the fields of improvement science and implementation science and of the methods by which we drew from this literature to develop the current framework, we present how this framework was customized and applied to three national public health initiatives. We close with implications for how evaluators can apply the framework to facilitate a deeper understanding of a program’s implementation and success, collaborate with project stakeholders, and facilitate sustainability, spread, and scale-up of public health initiatives.
Responsive evaluation honors democratic and participatory values and intends to foster dialogues among stakeholders to include their voices and enhance mutual understandings. The question explored in this article is whether and how responsive evaluation can offer a platform for moral learning (Bildung) in the interference zone between system and lifeworld. A case example from Dutch psychiatry is presented. Policy makers aimed to develop a "monitoring instrument" for closed psychiatric wards to protect patient rights and prevent incidents. Tensions arose between strategic action and system values (accountability, efficiency, control, safety) and the search for meaning and morality. Several dynamics were set in motion. Through the creation of communicative spaces in which there was room for expression of emotions and stories, the "colonization" by system values was countered. Another dynamic called "culturalization" started simultaneously, that is, adoption of lifeworld values in the system world, which resulted in constructive dialogues on the meaning of good care and moral learning.
There is currently a paucity of literature in the field of evaluation regarding the practice of reflection and reflexivity and a lack of available tools to guide this practice—yet using a reflexive model can enhance evaluation practice. This paper focuses on the methods and results of a reflexive inquiry that was conducted during a participatory evaluation of a project targeting homelessness and mental health issues. I employed an action plan composed of a conceptual model, critical questions, and intended activities. The field notes made throughout the reflexive inquiry were analyzed using thematic content analysis. Results clustered in categories of power and privilege, evaluation politics, the applicability of the action plan, and outcomes. In this case study, reflexivity increased my competence as an evaluation professional: The action plan helped maintain awareness of how my personal actions, thoughts, and personal values relate to broader evaluation values—and to identify incongruence. The results of the study uncovered hidden elements and heightened awareness of subtle dynamics requiring attention within the evaluation and created opportunities to challenge the influence of personal biases on the evaluation proceedings. This reflexive model allowed me to be a more responsive evaluator and can improve practice and professional development for other evaluators.
To explore the relationship between theory and practice in evaluation, we focus on the perspectives and experiences of student evaluators, as they move from the classroom to an engagement with the social, political, and cultural dynamics of evaluation in the field. Through reflective journals, postcourse interviews, and facilitated group discussions, we involve students in critical thinking around the relationship between evaluation theory and practice, which for many was unexpectedly tumultuous and contextually dynamic and complex. In our exploration, we are guided by the following questions: How do novice practitioners navigate between the world of the classroom and the world of practice? What informs their evaluation practice? More specifically, how can we understand the relationship between theory and practice in evaluation? A thematic analysis leads to three interconnected themes. We conclude with implications for thinking about the relationship between theory and practice in evaluation.
The first article in this series traces the initial development of the concept of evaluation use. As a field, evaluation has always paid attention to the potential for use, both in decision-making and in changing people’s thinking. The broad history of the field as we know it today stemmed from two streams: one focused on tests and measurement, primarily in education, and a second focused on social research methods, primarily concerning knowledge utilization. Evaluation use had its roots in both streams, resulting in three broad categories for discussing the use of evaluation findings: instrumental use, conceptual use or enlightenment, and symbolic use. The additional category of process use, added years later, highlighted the potential utility of people’s participation in the evaluation process.
The evaluation community has demonstrated an increased emphasis and interest in evaluation capacity building in recent years. A need currently exists to better understand how to measure evaluation capacity and its potential outcomes. In this study, we distributed an online questionnaire to managers and evaluation points of contact working in grantee programs funded by four large federal public health programs. The goal of the research was to investigate the extent to which assessments of evaluation capacity and evaluation practice are similar or different for individuals representing the same program. The research findings revealed both similarities and differences within matched respondent pairs, indicating that whom one asks to rate evaluation capacity in an organization matters.
The need for evaluation capacity building (ECB) in military psychological health is apparent in light of the proliferation of newly developed, yet untested programs coupled with the lack of internal evaluation expertise. This study addresses these deficiencies by utilizing Preskill and Boyle’s multidisciplinary ECB model within a post-traumatic stress disorder treatment program. This model outlines a theoretical framework, offers practical strategies, and emphasizes both context and culture, which are paramount in military health-care settings. This study found that the model provides a highly applicable ECB framework that includes ways to identify ECB objectives, tailor activities, and understand outcomes. While there was high utilization of ECB activities by program staff, there was misaligned evaluative thinking, which ultimately truncated sustainable evaluation practice. Based on this research, evaluators can better understand how to provide an ECB intervention in a complex cultural and political environment and assess its effectiveness.
Pressure on evaluators has been investigated recently by surveys in the USA, the UK, Germany, and Switzerland. This study compares the results of those studies regarding pressure on evaluators in different countries. The findings suggest that independence of evaluations does not exist for many respondents. Moreover, the person who commissioned the evaluator for evaluation is identified by all studies as the primary influencing stakeholder in the evaluation process. In terms of differences, Germany seems to be more prone to pressure on evaluators. However, German evaluators do not show stronger tendencies to surrender to pressure than the other countries’ respondents. We suggest that this pattern may be explained by the strong state tradition in Germany as opposed to the U.S. and Switzerland, in conjunction with evaluators’ profession-based, principled resistance to such pressure.
Reflective case narratives are a practical mechanism for conveying lessons learned for practice improvement. Their ability to transform experience into knowledge in a colloquial, narrative style positions reflective case narratives as a powerful learning tool with pedagogical benefits for the evaluation community. However, one criticism of reflective case narratives is they suffer from loose guidelines and lengthy discussions that obscure lessons learned. Through the use of adult learning theory, a restructuring of reflective case narratives are presented to facilitate learning. The restructuring consists of five sections to increase the likelihood of reflective case narratives as a resource for transferring knowledge, sharing ideas, and keeping a pulse on the dynamic and fluid process of evaluation.
Most evaluators have embraced the goal of evidence-based practice (EBP). Yet, many have criticized EBP review systems that prioritize randomized control trials and use various criteria to limit the studies examined. They suggest this could produce policy recommendations based on small, unrepresentative segments of the literature and recommend a more traditional, inclusive approach. This article reports two empirical studies assessing this criticism, focusing on the What Works Clearinghouse (WWC). An examination of outcomes of 252 WWC reports on literacy interventions found that 6% or fewer of the available studies were selected for review. Half of all intervention reports were based on only one study of a program. Data from 131 studies of a reading curriculum were used to compare conclusions using WWC procedures and more inclusive procedures. Effect estimates from the inclusive approach were more precise and closer to those of other reviews. Implications are discussed.
The authors discuss both the genesis of the Eleanor Chelimsky Forum on Evaluation Theory and Practice and the 2015 Forum, which featured remarks by Abraham Wandersman.
In this commentary I discuss how Getting to Outcomes and other empowerment strategies have potential to contribute to a theory of evaluation practice, address the epistemology of quality improvement, and inform the external validity of outcome evaluations.
Research on the evaluation of large-scale public-sector reforms is rare. This article sets out to fill that gap in the evaluation literature and argues that it is of vital importance since the impact of such reforms is considerable and they change the context in which evaluations of other and more delimited policy areas take place. In our analysis, we apply four governance perspectives (rational-instrumental perspective, rational interest–based perspective, institutional-cultural perspective, and chaos perspective) in a comparative analysis of the evaluations of two large-scale public-sector reforms in Denmark and Norway. We compare the evaluation process (focus and purpose), the evaluators, and the organization of the evaluation, as well as the utilization of the evaluation results. The analysis uncovers several significant findings including how the initial organization of the evaluation shows strong impact on the utilization of the evaluation and how evaluators can approach the challenges of evaluating large-scale reforms.
Many evaluations of programs tend to show few outcomes. One solution to this has been an increasing prominence of the movement that requires programs to implement evidence-based interventions (EBIs). But in a complex world with complex organizations and complex interventions, many challenges have arisen to the implementation of EBIs with fidelity to achieve outcomes at scale. This includes challenges to achieving outcomes in each setting. In this article, we propose the use of empowerment evaluation and one of its major approaches (Getting To Outcomes [GTO]) as a promising method to address the challenges, and GTO can help organizations achieve outcomes by leading them through a set of "accountability questions" for implementing EBIs in their particular setting. These questions can be asked at multiple levels (e.g., national, state, and local organizations) responsible for achieving outcomes. Although we illustrate the possibilities with examples from health care and public health, the potential strategies can be applied to many areas of health and human services and education.
This article discusses the nonuse, misuse, and proper use of pilot studies in experimental evaluation research. The authors first show that there is little theoretical, practical, or empirical guidance available to researchers who seek to incorporate pilot studies into experimental evaluation research designs. The authors then discuss how pilot studies can be misused, using statistical simulations to illustrate the error that can result from using effect sizes from pilot studies to decide whether to conduct a full trial or using effect sizes from pilot studies as the basis of power calculations in a full trial. Informed by the review of the literature and these simulation results, the authors conclude by proposing practical suggestions to researchers and practitioners on how to properly use pilot studies in experimental evaluation research.
Using a case example of an innovative sanitation solution in a slum setting, this study explores the usefulness of the Zaltman Metaphor Elicitation Technique in a program planning and evaluation context. Using a qualitative image-based method to map people’s mental models of ill-structured problems such as sanitation can aid program planners and evaluators in understanding how a program can fit the reality of beneficiaries. The technique is a tool for investigating what beneficiaries think about specific problems a program is aimed at solving and their underlying beliefs. The results offer a comprehensive hierarchical value map of different types and levels of insights into parents’ thoughts and feelings about school sanitation and their child’s well-being, often expressed as desired values, goals, or end states. Based on the results, a discussion is provided about the usefulness of the technique in the given context.
Professional learning communities (PLCs) have emerged as one of the nation’s most widely implemented strategies for improving instruction and PK–12 student learning outcomes. PLCs are predicated on the principles of improvement science, a type of evidenced-based collective inquiry that aims to bridge the research–practice divide and increase organizational capacity to solve pressing problems of practice. In this article, the Teacher Collaboration Assessment Rubric (TCAR) is presented in which the evidenced-based attributes of rigorous PK–12 PLCs are operationalized. The author describes how the TCAR has been used for developmental, formative, and outcome evaluation purposes in PK–12 settings. The value of the TCAR and the evaluation of PLCs in other complex organizational systems such as health care and the sciences are also discussed. PLC evaluation can help teams to focus on improvement and to avoid "collaboration lite" whereby congeniality and imprecise conversation are confused with the disciplined professional inquiry vital to organizational improvement.
As evaluators, we are often asked to determine whether policies and programs provide value for the resources invested. Addressing that question can be a quandary, and, in some cases, evaluators question whether cost–benefit analysis is fit for this purpose. With increased interest globally in social enterprise, impact investing, and social impact bonds, the search is on to find valid, credible, useful ways to determine the impact and value of social investments. This article argues that when addressing an evaluative question about an economic problem (the merit, worth, or significance of resource use), economic analysis can enhance evaluation but is usually insufficient to fully answer the evaluative question and that a stronger approach would involve explicit evaluative reasoning, supported by judicious use of economic and other methods. An overarching theoretical framework is proposed, and implications for evaluation practice are discussed.
The authors of this article each bring a different theoretical background to their evaluation practice. The first author has a background of attention to culturally responsive evaluation (CRE), while the second author has a background of attention to systems theories and their application to evaluation. Both have had their own evolution of thinking and application of their respective conceptual traditions over the last 20+ years, influenced considerably by their involvement in the American Evaluation Association. They recently worked together to build evaluation capacity among evaluators of science, technology, engineering, and mathematics (STEM) education programs, in which they explored how these two conceptual and theoretical paths connect. In this article, the authors present their current thinking about the relationship between CRE and systems-oriented evaluation. In a case example, they illustrate the value of integrating the two perspectives to determine the guiding questions for an evaluation of a STEM education project.
Evaluators often use qualitative research methods, yet there is little evidence on the comparative cost-effectiveness of the two most commonly employed qualitative methods—in-depth interviews (IDIs) and focus groups (FGs). We performed an inductive thematic analysis of data from 40 IDIs and 40 FGs on the health-seeking behaviors of African American men (N = 350) in Durham, North Carolina. We used a bootstrap simulation to generate 10,000 random samples from each data set and calculated the number of data collection events necessary to reach different levels of thematic saturation. The median number of data collection events required to reach 80% and 90% saturation was 8 and 16, respectively, for IDIs and 3 and 5 for FGs. Interviews took longer but were more cost-effective at both levels. At the median, IDIs cost 20–36% less to reach thematic saturation. Evaluators can consider these empirically based cost-effectiveness data when selecting a qualitative data collection method.
Since the early 2000s, a significant number of programs and policies have been developed and implemented to prevent and combat human trafficking. At the international, regional and national levels, government, and international, and nongovernment organizations have established plans of action, conducted training, developed policy tools, and conducted a variety of other activities to counter the phenomenon of trafficking in persons. However, only a small number of these anti–human trafficking interventions have been evaluated and an even fewer number have been evaluated rigorously. This article explores the approaches that have been used to evaluate anti–human trafficking interventions. Through a review of 49 evaluations, the study finds that action is required to increase quality evaluations of anti–human trafficking programs in order to ensure that programs are targeted, implemented, and delivered effectively, and the knowledge on the impact of programs is improved.
This study explores the relationship between evaluation policies and evaluation practice. Through document analysis, interviews, and a multiple case study, the research examined the explicit and implicit policies overarching the evaluation work commissioned by the Robert Wood Johnson Foundation (RWJF) and explored how these policies are implemented in the field. This examination of evaluation policies at RWJF has pointed out some significant strengths, including emphasis on the importance of evaluation; collegiality in defining, formulating, and monitoring evaluations; using a variety of evaluation products to communicate results; and use of evaluation advisory committees to strengthen evaluation approaches. However, these policies have evolved somewhat haphazardly over time. Consequently, some written policies are absent or inadequate and some policies are followed with less consistency than others. The findings point to the importance of a comprehensive and integrated set of evaluation policies grounded in intended outcomes and the need for additional studies on this topic.
The systematic identification of evaluator competency training needs is crucial for the development of evaluators and the establishment of evaluation as a profession. Insight into essential competencies could help align training programs with field-specific needs, therefore clarifying expectations between evaluators, educators, and employers. This investigation of practicing evaluators’ perceptions of competencies addresses the critical need for a competency training gap analysis. Results from an online survey of 403 respondents and a follow-up focus group indicate that the professional practice and systematic inquiry competencies are seen as most important for conducting successful evaluations. Evaluators identified need for additional training in the interpersonal competence and reflective practice competency domains. Trends identified can support the development and modification of programs designed to offer training, education, and professional development to evaluators.
Making causal claims is central to evaluation practice because we want to know the effects of a program, project, or policy. In the past decade, the conversation about establishing causal claims has become prominent (and problematic). In response to this changing conversation about causality, we argue that evaluators need to take up some new ways of thinking about and examining causal claims in their practices, including (1) being responsive to the situation and intervention, (2) building relevant and defensible causal arguments, (3) being literate in multiple ways of thinking about causality, (4) being familiar with a range of causal designs and methods, (5) layering theories to explain causality at multiple levels; and (6) justifying the causal approach taken to multiple audiences. Drawing on recent literature, we discuss why and how evaluators can take up each of these ideas in practice. We conclude with considerations for evaluator training and future research.
In an era of ever-deepening budget cuts and a concomitant demand for substantiated programs, many organizations have elected to conduct internal program evaluations. Internal evaluations offer advantages (e.g., enhanced evaluator program knowledge and ease of data collection) but may confront important challenges, including credibility threats, conflicts of interest, and power struggles. Thus, demand for third-party meta-evaluation may be on the rise to offset such limitations. Drawing on the example of a moderately large and fairly complex five-year program to build state department of education capacity to implement federal education law, this article explores the development and use of external responsive meta-evaluation (RME) to build evaluator capacity, enhance the evaluation’s quality, optimize evaluation use, and minimize conflict. After describing RME, the authors discuss its activities, strengths, and limitations through the case example of the Appalachia Regional Comprehensive Center. The article concludes with recommendations for adapting RME for use in various settings.
This article presents the principles and findings of developing a new participatory assessment of development (PADev) evaluation approach that was codesigned with Dutch nongovernmental organizations (NGOs) and northern and southern research institutes over a period of 4 years in the context of rural development in Ghana and Burkina Faso. Although participatory approaches in development evaluations have become widely accepted since the 1990s, the PADev approach is different by taking the principles of holism and local knowledge as starting points for its methodological elaboration. The PADev approach is found to have an added value for assessing the differentiated effects of development interventions across different subgroups in a community through intersubjectivity. Moreover, if PADev is taken up by a multitude of stakeholders, including the intended beneficiaries of development interventions and development stakeholders, it can contribute to a process of local history writing, knowledge sharing, capacity development, and providing input into community action plans and the strategies of community-based organizations and NGOs.
This study examines the perceptions of data-driven decision making (DDDM) activities and culture in organizations driven by a social mission. Analysis of survey information from multiple stakeholders in each of eight social enterprises highlights the wide divergence in views of DDDM. Within an organization, managerial and nonmanagerial staff working for the organization and staff from a prominent funder all expressed different perceptions of the same organization’s DDDM activities and culture. Study findings also provide insights into how to improve an organization’s capacity to build and use performance management systems, which include building a common understanding about what activities are—or are not—being undertaken. Finally, findings provide insights about structuring research on DDDM, which indicate that information from only one respondent in an organization or only one organization might not be reliable or generalizable.
This article presents a discourse analytic study of how the concept of impartiality is socially constructed by members of the development aid community through an examination of linguistic traits and patterns within (a) inter- and intraorganizational interactions and (b) relevant aid evaluation policy documents. A qualitative analysis of unstructured and semistructured interviews with development professionals in Japan and a content analysis of relevant evaluation policies and documents have revealed observable evidence of distinct institutional practices of impartiality via project evaluations. Most notably, in contrast to a notion of evaluator impartiality that is strengthened through the principle of independence, development professionals in Japan perceive evaluations (‘hyouka’) to be a fundamentally hierarchical construct in which evaluator impartiality can be strengthened or legitimized through authoritative or hierarchical means.
Research on evaluation has mainly focused on the use of evaluation and has given little attention to the origins of evaluation demand. In this article, I consider the question of why parliamentarians demand evaluations with parliamentary requests. Building on the literature of delegation, I use a principal-agent framework to explain the origins of evaluation demand. In doing so, I argue that the parliamentarians mainly demand evaluations in order to hold the government accountable. The quantitative analysis shows that Swiss parliamentarians demand more evaluations if they have the impression that the administration does not implement the policies within their meaning. This finding suggests that parliamentarians demand evaluations in order to fulfill their oversight function towards the government. This conclusion could be relevant in order to understand the role of evaluations within the parliamentary arena.
Interest in the regression discontinuity (RD) design as an alternative to randomized control trials (RCTs) has grown in recent years. There is little practical guidance, however, on conditions that would lead to a successful RD evaluation or the utility of studies with underpowered RD designs. This article describes the use of RD design to evaluate the impact of a supplemental algebra-readiness curriculum, Transition to Algebra, on students’ mathematics outcomes. Lessons learned highlight the need for evaluators to understand important data requirements for strong RD evaluation studies, the need to collaborate with informed and committed partners to ensure successful RD design implementation, the value of embedding an RCT within an RD design whenever possible, and the need for caution when contemplating an RD design with a small sample. Underpowered RD studies—unlike underpowered RCTs—may not produce useful evaluation results, particularly if other RD data requirements are not met.
Little is known empirically about intraclass correlations (ICCs) for multisite cluster randomized trial (MSCRT) designs, particularly in science education. In this study, ICCs suitable for science achievement studies using a three-level (students in schools in districts) MSCRT design that block on district are estimated and examined. Estimates of ICCs, by district, are computed for unconditional two-level (students in schools) hierarchical linear models using student-level science achievement raw scores as measured by the Texas Assessment of Knowledge and Skills in 2010–2011 for Grades 5, 8, 10, and 11. The average within-district ICCs for MSCRTs of science achievement range from 0.0781 to 0.0982 across grades. For Grade 5, a significant difference is found in the ICC estimate suitable for MSCRTs with many districts versus only a few districts. Consequently, evaluators of science education interventions should exhibit care when selecting ICC estimates for three-level MSCRT designs that block on district.
In the United States, in 2013, 610,042 people were estimated homeless in one night. Improving the effectiveness of homeless assistance programs, particularly aligning programs’ practices with their goals, is critical to serving this population. Using a theory that predicts homeless exits, this study presents an innovative, low-cost evaluation tool that can be used by a wide range of human service providers to conduct more frequent "in-house" process evaluations. The Gap Assessment of Policy and Practice (GAPP) tool streamlines process evaluations thus improving social programs. To test this tool’s effectiveness, we compared the results of a traditional process evaluation and a GAPP tool evaluation of a homeless assistance program. Both evaluations revealed a consistent disparity between program activities and expressed goals. The GAPP tool is less time intensive and provides a useful road map for structuring a process evaluation for program providers, thus increasing program impact by encouraging more frequent and efficient self-assessments.
Heterogeneity between and within people necessitates the need for sequential personalized interventions to optimize individual outcomes. Personalized or adaptive interventions (AIs) are relevant for diseases and maladaptive behavioral trajectories when one intervention is not curative and success of a subsequent intervention may depend on individual characteristics or response. AIs may be applied to medical settings and to investigate best prevention, education, and community-based practices. AIs can begin with low-cost or low-burden interventions and followed with intensified or alternative interventions for those who need it most. AIs that guide practice over the course of a disease, program, or school year can be investigated through sequential multiple assignment randomized trials (SMARTs). To promote the use of SMARTs, we provide a hypothetical SMART in a Head Start program to address child behavior problems. We describe the advantages and limitations of SMARTs, particularly as they may be applied to the field of evaluation.
This article presents discussion and recommendations on approaches to retrospectively evaluating development interventions in the long term through a systems lens. It is based on experiences from the implementation of an 18-month study to investigate the impact of development interventions on economic and social change over a 40-year period in the Koshi Hills region of Nepal. A multi-disciplinary team used a mixed-methods approach to data collection and analysis. A theory-based analytical approach was utilized to produce narratives of plausible cause and effect and identify key drivers of change within the context of cumulative and interconnected impacts of multiple programs and factors. This article responds to increasing interest in development evaluation to look beyond intervention-specific impact to broader determinants of change to assist with intervention planning. It is the authors’ hope that this will stimulate debate and progress in the use of high quality research to inform future development work.
Evaluation has become expected within the nonprofit sector, including HIV prevention service delivery through community-based organizations (CBOs). While staff and directors at CBOs may acknowledge the potential contribution of evaluation data to the improvement of agency services, the results of evaluation are often used to demonstrate fiscal prudence, efficiency, and accountability to funders and the public, rather than to produce information for the organization’s benefit. We conducted 22 in-depth, semistructured interviews with service providers from four agencies implementing the same evidence-based HIV prevention intervention. We use the lens of "audit culture" to understand how the evaluation and accountability mandates of evidence-based program implementation within HIV prevention service provision affect provider–client relations, staff members’ daily work, and organizational focus in natural settings, or contexts without continuous support and implementation monitoring. We conclude with recommendations for improving the use and methods of evaluation within HIV prevention service delivery.
Qualitative data offer advantages to evaluators, including rich information about stakeholders’ perspectives and experiences. However, qualitative data analysis is labor-intensive and slow, conflicting with evaluators’ needs to provide punctual feedback to their clients. In this method note, we contribute to the literature on rapid evaluation and assessment methods by proposing procedures that evaluators can use to expedite the coding and analysis of qualitative data and comparing these procedures to other possible methods. Specifically, we outline procedures for the rapid identification of themes from audio recordings, which allow evaluators to code and analyze qualitative data without time-consuming transcription. We illustrate the use and assess the reliability of these procedures using qualitative semi-structured interview data from 18 public school administrators on how they locate information about and decide to use instructional, health, and social skills programming in their districts. Finally, we end with advantages and trade-offs of these procedures as well as recommendations for how to apply them.
Evaluation theories can be tested in various ways. One approach, the experimental analogue study, is described and illustrated in this article. The approach is presented as a method worthy to use in the pursuit of what Alkin and others have called descriptive evaluation theory. Drawing on analogue studies conducted by the first author, we illustrate the potential benefits and limitations of analogue experiments for studying aspects of evaluation and for contributing to the development and refinement of evaluation theory. Specifically, we describe the results of two studies that examined stakeholder dialogue under different conditions of accountability frame, interpersonal motives, and epistemic motives. We present the studies’ main findings while highlighting the potential for analogue studies to investigate questions of interest concerning evaluation practice and theory. Potentials and pitfalls of the analogue study approach are discussed.
Although collaboration is recognized as an effective means to address multifaceted community issues, successful collaboration is difficult to achieve and failure is prevalent. To effectively collaborate, collaborators must recognize the strengths and weaknesses within their own efforts. Using Mattessich and colleagues’ work as a springboard, a seven-factor model of effective collaboration is presented along with an accompanying evaluation tool, the Collaboration Assessment Tool (CAT). Confirmatory factor analysis of the CAT validated the proposed model with all seven collaboration factors demonstrating strong internal consistency. Concurrent validity was established through expected positive intercorrelations between the factors as well as strong positive correlations with the perceived success of collaborative efforts. As evaluators are increasingly asked to evaluate collaborations and coalitions, this conceptual model and tool can provide evaluators with a grounded, reliable, and valid assessment instrument to work with clients to build collaborative efforts in an intentional, comprehensive, and effective manner.
Although evaluators often use an interrupted time series (ITS) design to test hypotheses about program effects, there are few empirical tests of the design’s validity. We take a randomized experiment on an educational topic and compare its effects to those from a comparative ITS (CITS) design that uses the same treatment group as the experiment but a nonequivalent comparison group that is assessed at six time points before treatment. We estimate program effects with and without matching of the comparison schools, and we also systematically vary the number of pretest time points in the analysis. CITS designs produce impact estimates that are extremely close to the experimental benchmarks and, as implemented here, do so equally well with and without matching. Adding time points provides an advantage so long as the pretest trend differences in the treatment and comparison groups are correctly modeled. Otherwise, more time points can increase bias.
In developing countries, an increasing number of civil society organizations (CSOs) engage in independent monitoring and evaluation (M&E) of government programs and policies. Most CSOs rely on a range of M&E tools in combination with advocacy strategies to hold government accountable and improve the implementation of programs and policies. Despite the popularity of such initiatives, their effectiveness and impact remain unconfirmed and are not well understood. In addition, little is known about the influence of this type of CSO-led M&E at the district level. Using a case study design, the current research provides a map of the different influence mechanisms that occurred following the M&E of the National Health Insurance by Ghanaian CSOs. The research further suggests that the built-in dialogue space is acting as a catalyst for certain influence mechanisms.
Program evaluation is recognized as an essential skill set for practitioners in service-related fields, such as education, nonprofit management, social work, and public health. Recently, the need for a public workforce trained in evaluation has increased and is driven primarily by our nation’s emphasis on accountability during a time when financial resources are limited. However, many of these professionals lack the necessary skills to conduct evaluations of their programs and often rely on a much smaller number of evaluation consultants to perform this task. How then can we educate students and professionals in ways that build evaluation capacity among service organizations and meet these growing needs? A novel course design that integrates principles of adult learning, participatory evaluation approaches, and experiential forms of learning to build evaluation capacity among students and a nonprofit organization is presented. Evidence is provided to demonstrate student learning and the impact of the course on the nonprofit service organization’s evaluation capacity.
The growth in the availability of longitudinal data—data collected over time on the same individuals—as part of program evaluations has opened up exciting possibilities for evaluators to ask more nuanced questions about how individuals’ outcomes change over time. However, in order to leverage longitudinal data to glean these important insights, evaluators responsible for analyzing longitudinal data face a new set of concepts and analytic techniques that may not be part of their current methodological tool kit. In this article, I provide an applied introduction to one method of longitudinal data analysis known as multilevel growth modeling. I ground the introductory concepts and illustrate the method of multilevel growth modeling in the context of a well-known longitudinal evaluation of an early childhood care program, the Carolina Abecedarian Project.
Despite our best efforts as evaluators, program implementation failures abound. A wide variety of valuable methodologies have been adopted to explain and evaluate the why of these failures. Yet, typically these methodologies have been employed concurrently (e.g., project monitoring) or to the post-hoc assessment of program activities. What we believe to be missing are methods that will lead to the successful prediction of program implementation failures in advance, methods that will lead us directly to the "how" and, especially, the "how likely" of program implementation failure. To that end we propose, discuss, and illustrate three such methods that seemingly hold promise – marker analysis, the wisdom of crowds, and "Big Data." Additionally, we call for an expanded role for evaluation – explanation, but also prediction without a total embrace of the need to understand why a prediction works.
Longitudinal substance abuse research has often been compromised by high rates of attrition, thought to be the result of the lifestyle that often accompanies addiction. Several studies have used strategies including collection of locator information at the baseline assessment, verification of the information, and interim contacts prior to completing the follow-up to minimize attrition, however it is unclear whether these strategies are equally effective for participants struggling with varying levels of housing stability, support for sobriety, and substance abuse severity. The current study extends research supporting the effectiveness of follow-up strategies with a focus on locator form completion and continual verification contacts. Results indicated that each additional piece of locator form information and verification contact significantly and independently increased the odds for completing a follow-up interview, and that these effects were not moderated by participant characteristics. Practical and theoretical implications for longitudinal substance abuse research are discussed.
Quality training opportunities for evaluators will always be important to the evaluation profession. While studies have documented the number of university programs providing evaluation training, additional information is needed concerning what content is being taught in current evaluation courses. This article summarizes the findings of a survey administered to university faculty who provide such courses, including (a) topics taught, (b) time spent on topics, and (c) instructors perceptions of topic importance. Study results show considerable diversity in the training of new evaluators. This diversity is often complicated by the range of contexts, purposes, and situations involved.
Extracted from a larger study of the educational evaluation profession, this qualitative analysis explores how evaluator identity is shaped with constant reference to political economy, knowledge work, and personal history. Interviews with 24 social scientists who conduct or have conducted evaluations as a major part of their careers examined how they came to recognize themselves as adept evaluators. The paper explores four adaptations to program evaluation—higher education faculty who define themselves as academic entrepreneurs whose work is largely funded by evaluation contracts; post-academics who seek intellectual freedom beyond the university; professional evaluators who perceive themselves as intimately connected to contract research organizations; and layover evaluators who are waiting for the next career move. The paper concludes with a discussion of implications for the professional development and academic training of future program evaluators.
In recent years, quantitative research methodology has become more conceptually integrated and technically sophisticated. Fundamental insights regarding design and analytic frameworks that support causal inference along with the development of estimation algorithms appropriate for multilevel and latent variable models have altered traditional methodological practice and ushered in new appreciation for the underlying relationship among modern data modeling techniques. In this article, I provide a brief outline of five methodological content domains that have increasing relevance for quantitatively oriented evaluators, (1) causal inference/experimental design, (2) multilevel modeling, (3) structural equation/latent variable modeling, (4) longitudinal data analysis, and (5) missing data, and the accompanying textbook resources that facilitate understanding and use. A target audience for each text is also identified.
In this paper we share our reflections, as evaluators, on an evaluation where we encountered Excessive Evaluation Anxiety (XEA). The signs of XEA which we discerned were particularly evident amongst the program head and staff who were part of a new training program. We present our insights on the evaluation process and its difficulties, as well as our suggestions for coping with them. We suggest that signs of XEA and its consequences can be reduced by means of developing a detailed contract based on clear rules that is acceptable to all parties (evaluators, evaluands, clients and stakeholders) and would address ethical as well as technical issues. Finally, we propose a guide for formulating such evaluation contracts.
In evaluation and applied social research, focus groups may be used to gather different kinds of evidence (e.g., opinion, tacit knowledge). In this article, we argue that making focus group design choices explicitly in relation to the type of evidence required would enhance the empirical value and rigor associated with focus group utilization. We offer a descriptive framework to highlight contrasting design characteristics and the type of evidence they generate. We present examples of focus groups from education and healthcare evaluations to illustrate the relationship between focus group evidence, design, and how focus groups are conducted. To enhance the credibility of focus group evidence and maximize potential learning from this popular qualitative data collection method, we offer a set of questions to guide evaluators reflection and decision making about focus group design and implementation.
A synthesis of the state of the literature is discussed in this section of the Evaluation Capacity Building (ECB) forum organized around four critical questions: (1) What is ECB? (2) How can we make it happen? (3) How do we know it is happening? and (4) What is its impact? The authors argue that to move the field of ECB forward we need to envision the science of ECB, not as the sole activity of creating new knowledge but in a close congruent and reciprocal relationship with practice. By adhering to a science-practice model, we conduct research that directly responds to and contributes to practice thus creating strong synergies between ECB practitioners and researchers. Research on ECB needs to be informed by real issues happening in practice, and practice of ECB needs to be informed by the new knowledge created. We must strengthen the science to refine our practice and strengthen the practice to refine the science.
Though several excellent literature reviews and research syntheses have been conducted, and thoughtful frameworks and models have been proposed, I believe it is time for the evaluation field to tackle the "hard stuff" of evaluation capacity building (ECB). This entails engaging staff in ECB activities, building the evaluation capacity of leaders, focusing on learning transfer, and evaluating ECB efforts. In this brief article, I describe these four challenges and pose questions for both practitioners and researchers to consider and act upon.
Although some argue that distinctions between "evaluation" and "development evaluation" are increasingly superfluous, it is important to recognize that some distinctions still matter. The severe vulnerabilities and power asymmetries inherent in most developing country systems and societies make the task of evaluation specialists in these contexts both highly challenging and highly responsible. It calls for specialists from diverse fields, in particular those in developing countries, to be equipped and active, and visible where evaluation is done and shaped. These specialists need to work in a concerted fashion on evaluation priorities that enable a critical scrutiny of current and emerging development frameworks and models (from global to local level), and their implications for evaluation–and vice versa. The agenda would include studying the paradigms and values underlying development interventions; working with complex adaptive systems; interrogating new private sector linked development financing modalities; and opening up to other scientific disciplines' notions of what constitutes "rigor" and "credible evidence." It would also promote a shift focus from a feverish enthrallment with "measuring impact" to how to better manage for sustained impact. The explosion in the development profession over the last decade also opens up the potential for non-Western wisdom and traditions, including indigenous knowledge systems, to help shape novel development as well as evaluation frameworks in support of local contexts. For all these efforts, intellectual and financial resources have to be mobilized across disciplinary, paradigm, sector and geographic boundaries. This demands powerful thought leadership in evaluation–a challenge in particular for the global South and East.
In many less developed democracies Voluntary Organizations for Professional Evaluation (VOPEs) face the challenges of low demand for evaluation and the resulting low economic capacity of national evaluation communities. The VOPE model that evolved in well-developed democracies is not directly applicable under these circumstances, so a new model has to be developed. EvalPartners Initiative, launched in 2012 by the International Organization for Cooperation in Evaluation and UNICEF with support of international donors, seeks to facilitate the development of this new model by building a global community of national VOPEs able to effectively influence policy makers and public opinion to promote evaluation as the basis for evidence-based policy. EvalPartners activities include: (a) facilitation of peer-to-peer collaborations among VOPEs; (b) development of a toolkit on VOPE institutional capacity; (c) generation of new knowledge on VOPE operation; (d) promotion of enabling environment for evaluation; (e) promotion of equity-focused and gender-responsive evaluation.
Evaluators have an obligation to present clearly the results of their evaluative efforts. Traditionally, such presentations showcase formal written and oral reports, with dispassionate language and graphs, tables, quotes, and vignettes. These traditional forms do not reach all audiences nor are they likely to include the most powerful presentation possibilities. In this article, we share our use of alternative presentation formats, undertaken to increase the utility, appeal, and salience of our work. We offer a conceptual rationale for the use of "alternative representational forms" and describe the context for our use of alternative formats, which involved evaluations of various science, technology, engineering, and mathematics (STEM) educational programs. We present four examples, featuring visual display, performance, multiple program theories, and poetry. The article concludes with reflections on the future of alternative presentation approaches.
The discussion on and development of a holistic evaluation approach for rural development will be indispensable to improving and enriching the lives of rural people. This approach can be developed by considering the conceptualization of community policy structure in rural areas, the localization of policy structure in the rural community, and the promotion of participatory evaluation for rural community people. The development and use of such an evaluation approach will contribute much to rural development and bringing about positive societal change.
This article sketches the growth and accompanying challenges that face development evaluation in the future, by showing that whilst the progress in expanding the geographical presence of evaluation has been impressive, much more needs to be done. Important work has been done in promoting guidance through agencies such as the (UNEG, OECD/DAC and NONIE), and the recent coalescing of evaluation forces under EvalPartners help entrench evaluation. However, there needs to be a more deliberate effort to ensure that new entrants, many of who draw attention to their countries which have not had an evaluation culture, and which would potentially benefit from greater accountability and transparency, need support. The need to adapt methodologies to context is important but this should not compromise minimum evaluation standards, and high level political support for the function is critical. It is also necessary at a very practical level for there to be a contextual appreciation. The article recognizes that the freshness brought by new entrants should be welcomed and this means new levels of engagement with established practice, many of which are not known due to the limited publishing of work from this area. The active participation of evaluation leaders and professionals to support through mentorship and volunteerism is necessary, to accompany the technical and political dimensions of development evaluation.
In a world faced with unprecedented rising levels of inequality and injustice, is there a responsibility for our evaluation organizations to take on a leadership role in promoting inclusive, evaluative dialog and deliberation about the state of our democracies in relation to key democratic principles and ideals? In this forum, I question whether it is enough for us to rely on evaluators and evaluations to promote key democratic purposes and ideals.
This article synthesizes interview data from evaluation directors and top executives of philanthropic foundations on how evaluation might better advance their missions. In key informant interviews, respondents commented on the purposes of evaluation from the foundation’s perspective, challenges to effective evaluation, and the means by which evaluation could be made more valuable to foundations. Informants emphasized promoting accountability, improving programs, advancing organizational learning, and disseminating intervention models as desired uses of evaluation. Application of appropriate research techniques, relevance and timing of evaluation products, and internal thinking and social dynamics within foundations were cited as challenges to evaluation. Emerging concerns include the need to evaluate achievements of foundations as a whole and for evaluation to further develop as a science. In terms of increasing the value of evaluation, the current study indicates the importance of improved relationships and dialogue between evaluators and foundation personnel, stronger leadership consensus within foundations regarding evaluation, and safeguarding of the evaluation process.