After long delays, the report of the May discussions is finally available here: Evaluation Revisited Conference Report. Since May last year, I was able to attend three other events issues from which have also been incorporated in the discussions. Therefore this is a ‘conference and more’ report to inspire practice and promote discussion about these critical issues. Please share the report widely!
It is located here but also on the CDI website http://www.cdi.wur.nl/UK/programmes/InnovationChange/ (then scroll to ‘Publications’ at the bottom right).
Wishing you interesting reading and inspired evaluation processes,
Irene, Cecile and Jan
Reflections on “Evaluation Revisited: Improving the quality of evaluative practice by embracing complexity” – a Conference on Evaluation for Development.
I am currently working on the design of an evaluation of a large-scale, complex initiative (and complicated and chaotic – or that’s how I am experiencing it)! As I, and my colleagues, think about the learning that we hope will come from this initiative and the nature of the evaluation as a kind of animator of learning, we are grappling with what sometimes seems to be different and mutually exclusive epistemologies, each with their own ‘truth claims’ (or for the more modest – partial or provisional truth claims). Some amongst us would wish to see an evaluation that can make claims about impact through addressing a set of hypotheses using experimental or quasi-experimental methods involving the use of a counterfactual. Others would prefer a more mixed approach that uses quantitative and qualitative methods.
In attending this conference I wished to better equip myself with arguments for why a diversity of approaches can yield more generative learning and how a diversity of approaches can help us learn in our context, and share the learning more widely. I had little need to be convinced of this myself; but what I needed was stronger arguments and specific examples to assist in debates with those (sometimes called the randomistas) who believe that rigour is only evidenced in randomized controlled experimental forms of evaluation of development initiatives or RCTs (see this paper for an example of the way in which the term ‘rigorous’ is coupled with the experimental approach). In retrospect, I found the conference to be very interesting, extremely well-organised and timely, but I found I left with a sense of disquiet. There seem to me to be three main sources for this. I’ll discuss them, aware that this this is a provocative reflection but I hope honest. At the same time I’ll point out what I saw as highlights of the conference.
Firstly, it seemed to me that there was an elephant in the room – or perhaps a herd of them i.e. the randomistas! This resulted in a sort of asserting of “complexity” in the face of the elephants’ silent trumpeting of the “counterfactual”. I found Patricia Rogers’ articulation of the unhelpful ways in which complexity is used very helpful – but in my view, the concept of complexity was not made clear enough to really equip participants to argue strongly for why (rather than assert that) a diverse range of approaches is needed and is better, and what it is that these diverse approaches can offer that an RCT can’t offer. And because the experimental approach and its increase in influence were only really hinted at, the broader reasons for why this strong argument is needed were also not clear. So there was a kind of celebration of complexity, but without sufficient substance to it I was left feeling that ‘taken-for-granted’ assumptions were being confirmed rather than understandings being deepened, challenged or sharpened. There are a number of economists working with only quantitative data who are articulating very robust critiques of RCTs and analyses of their limitations, a recent one coming from the World Bank itself. I think the conference needed to bring such debates out more explicitly.
Patricia made the incredibly valuable point throughout her presentation that complexity can only be understood in retrospect. However, while I found them really useful, each of the three cases/methods sessions I attended were of studies that are either in progress or are still at the planning stage. The randomistas present a wide array of completed studies with measurable results that take on lives of their own. An alternative approach needs something similar, and I am not asking for measurable results here, but for completed accounts that simply say: “this is what we did, this is what it showed, and this is why understandings of complexity and emergence are important”, for example. It may have been that other case studies provided these, but the summaries do not give sufficient detail to really know this. In addition, I know that RCTs can be undertaken in relatively short time-scales, whereas retrospective accounts would need much more time. But I think it is important to demonstrate the work rather than talk about values and standards or state what “should” or what “needs” to be done. Descriptions rather than prescriptions can better prove the point.
Maarten Brouwers’ keynote presented an excellent framing of global trends in development, I thought. Although he hardly used the word complexity (if my notes are correct), his explanation of these trends (increasing uncertainty, increasing connectivity and the shift from a 1.0 society to a 2.0 one) illuminated the need for the capturing of multidimensionality. Instead of a polarization of approaches (randomistas vs relativistas, or even quantitative vs qualitative) he suggested how different methods can address the needs for different types of learning. I was also struck by his point that these three trends link, such that through real-time data collection there is a possible merging of monitoring with evaluation.
My sense of disquiet seemed to come, secondly, from the sense that a tacit agreement was conveyed that all in the room subscribed to a set of ideas best seen in the statements in the programme that “the conference will further methodological democracy” through the “choice of examples and advocating for methodological diversity that do justice to the existing diversity of development modalities and contexts”; and that it will contribute to development by showing how “evaluation can influence societal transformation”. This felt to me like a ‘taken-for-granted’ assumption again; sort of like we are all activists in the same cause. So I felt like I was being recruited to a position rather than being strengthened in my abilities to argue for my position.
And yet, issues of politics and power seemed strangely absent from the proceedings, apart from Sheela Patel’s excellent keynote. I felt that she was providing a powerful challenge to the way evaluation tends to get done. In putting the story of Slum Dwellers International (SDI) (rather than evaluation) first, she was able to express her frustration at being evaluated through the ‘project lens’, her wish for learning to be tracked throughout SDI’s long, long history, her own mystification about evaluation and its fads: “Northern organisations have very short memory spans” and “I’m often the historian of many Northern NGOs’ initiatives with us”. For me, there were two challenging and deeply political points in her presentation – the first was about the importance of scale (both in the sense of ‘scaling-up’ as SDI manages to do so well, and in the sense of ‘scale-jumping’ in that SDI can take a debate that is intensely local and amplify/project voice across scales and internationally); and the second was about how scaling can be designed when activities are thought of as both means and ends (this seemed to imply a critique of the idea of activities and outcomes – and instead the idea that activities themselves are outcomes – which is a very profound and un-technicist idea). (I wonder: Does this idea somehow link with Maarten Brouwers’ point that new forms of connectivity enable real-time evaluation by citizens and that this can lead to the merging of monitoring and evaluation, and that this can perhaps enable us to see activities as outcomes in themselves?) I felt that some of Sheela’s observations could have offered framings for much of the discussion about societal transformation and the role of evaluation, perhaps even disrupting and leading to emergent themes in the rest of the conference – yet they were not picked up.
This leads to my third sense of disquiet. A critique has been articulated in various places that the RCT approach puts the choice of method before clarifying the question that the method needs to answer, and this point was also made in the conference (not in relation to RCTs but in a more general sense). As a relative newcomer to the field, I felt that a bewildering array of methods was put forward in the conference, especially with the concept of the “methods market”. The reliance on methods and tools (which are often packaged and seen as applicable across all contexts) seems to fall into the trap of elevating these above the purpose of evaluation, which I see as finding principled ways of exercising judgement for illumination. Too much reliance on methods, tools and techniques takes away from judgement and can cloud illumination.
In summary, I find that Sarah Cummings’ thought-provoking blog post about “being caught in a cleft stick”, together with Chris Mowles response, really got to the heart of what my disquiet was about. Rather than trying pre-emptively to match or fit theories of complexity (or “embracing it”) with current evaluation practices, perhaps evaluators need to simply live with that gap. Staying close to Sheela’s provocative ideas, I offer an admittedly somewhat quirky set of issues that could be explored instead of trying to find a fit (and I know that much of this is already happening): that is, deepening understanding of the theory and the trends through arguments and counter-arguments, while at the same time trying to:
- address limitations with ‘the project lens’
- attempt to track learning across much longer time-scales, and before and after the time/space boundaries set around projects
- search for approaches that do justice to people’s own attempts to up-scale and scale-jump
- consider the possibility that activities that can be both means and ends
- experiment with real-time data collection that shortens feedback loops, increases citizen participation and merges monitoring and evaluation
- focus on descriptions rather than prescriptions
- become critical historians of our own initiatives
- scrutinize any agglomerations of new ideas, methods and techniques for faddishness!
Many thanks to the conference organizers for providing this forum and for opening up the space for honest reflection!
Catherine Kell (Twaweza initiative, East Africa)
July 23 2010
Just reading Alanna Shaikh‘ s blog ‘Is Impact Measurement a Dead End?’ on AidWatch – a blog project run by Bill Easterly with invited contributors. She asks ‘ At what point are we expecting too much from our impact measurement?’…. Yes, unrealistic expectations, discussed in May. Complexity, yes – discussed in May. Killing innovation by reducing measurement to numbers, yes – discussed in May. She’s writing about concerns many of us have had for a while. She’s also finally discovered the Cynefin framework! … discussed in May. Perhaps the issues we debated in will get some attention.
But where I think she doesn’t go far enough is by referring to impact measurement as a faulty but necessary tool and not specifying how we can make this less faulty to embrace complexity rigorously. There is no off the shelf ‘tool’ for us to use, Alanna. This is where the conversation needs to continue ….
And even more significantly, where the blog stops is why on earth it is that agencies don’t see the obvious and continue to insist on versions of impact measurement that can, at best, lead to meaningless numbers and at worst, shut the door on important – though unproven – innovations for development.
Read the comments below Shaikh’s blog – very interesting….
Irene Guijt, Learning by Design
The conference flyer was interesting and generated a lot of curiosity. I was pleasantly surprised when I received an invitation to become a case owner. However, I was astonished to see the profile of the participants and their concern for improving the quality of evaluation. Increasingly, as practitioners, we come across evaluations, which are longitudinal in nature and involve multiple stakeholders. This is what makes the six fundamentals of evaluation (relevance, effectiveness, efficiency, impact, sustainability, attribution/ contribution) complex and complicated to measure. Stakeholders’ perspectives vary; so does the expectations from evaluation. These expectations compete with each other for attention. However, often, mediation of these expectations is poor due to methodological and political reasons. Improvement in practice requires that practitioners, commissioners and donors work together on this issue. What will provide impetus to change in practice is the way evaluations are commissioned. The coming together of commissioners, donors, and practitioners, in this conference, in my view, was an encouraging first step.
Before the conference, I was under the impression that there would be a debate on an alternative to randomised design. Contrary to my impression, the conference grappled with the inherent complexity of social change processes and ways and means to measure those changes. The case clinics were helpful in providing a platform for the discussion. Being a case owner, I could not take part in other case clinics. I had to satisfy myself with the reading materials. The take-home for me was the difference between complicated and complex. This conference introduced me to complexity theory. However, the fact remains that improved understanding of what is complex is not going to make complex, simpler. Only appreciating what is complex will not help. We should be ready to push our skinny cows (read ‘arrogance of being an expert’) over the cliff. Thank you so much Mr. Bekalo [conference chair] for showing us our skinny cows. The skinny cow in the story was relatively easy to push. Pushing this one is not going to be easy. I recalled Gandhi’s talisman, which I thought reflected on the political economy of development (evaluation) and gave some tips to push the cow over the cliff.
“I will give you a talisman. Whenever you are in doubt, or when the self becomes too much with you, apply the following test. Recall the face of the poorest and the weakest man [woman] whom you may have seen, and ask yourself, if the step you contemplate is going to be of any use to him [her]. Will he [she] gain anything by it? Will it restore him [her] to a control over his [her] own life and destiny? In other words, will it lead to swaraj [freedom] for the hungry and spiritually starving millions? Then you will find your doubts and your self melt away.”
Sandip Pattanayak (Catalyst Management Services, India)
The Evaluation Revisited Conference was about making sense of what we do. It raised pertinent questions not just about methodologies but also about how we perceive and value evaluation. And it did so in a wonderfully participative engaging manner that involved all of us in debate and discussion, not providing definitive answers but exploring a plethora of ideas and approaches that have attempted to push the envelope in understanding the complex world that we try to evaluate.
For those of us, like me, who have long lived and worked in the hurly, burly of real world interventions, it has always been a challenge to understand what works and what does not. We know that our problems of poverty, gender, education and health are very complex and intertwined, long existing and difficult to resolve. And so it was with some consternation that one has listened to in other conferences and venues about claims of single bullet interventions producing rather far reaching impacts. Because we know that is not how the real world works. Funders are constrained in what they can fund and for how long but any intervention does not happen in a vacuum – it is influenced by what has happened before; by the present existence of multiple, interconnected issues and the many ‘new’ relationships that the ‘single’ intervention unexpectedly releases. But that is the nature of any activity or intervention that is implemented with real people in the real world. To then say that we can identify what that single intervention does or does not do is simplistic and arrogant, and ignores the many ripples of change it has produced. The conference reminded us of the need to address these complex ‘ripples’ and that we as evaluators must first acknowledge their existence and then make sense of it using our evaluation tools, methods and approaches. As one of the speakers, Sheila Patel from India mentioned, we ignore the deeper changes and are satisfied by evaluating the tip of the iceberg. What is worse is that we consider the tip of the iceberg evaluation to represent the whole iceberg. Such evaluations serve the narrow needs of budgets and timelines, selectively (sometimes erroneously) identify effects but worst of all, lose out on evaluating the richness of the human change that has occurred.
The Conference with its breakout groups, case presentations and plenary conducted in a seamless, thoughtful process enabled us participants to share our thoughts and argue about the various paths that evaluation can follow – from the complex to the simple definitive answers. The discussion also veered around whether evaluation thinking could be somewhere in between, recognizing that work plans and budgets need the bottom line clarity for effects and impacts but also understanding an equal need for ‘space’ to meander and explore the many contributing factors that affect and are affected in development work. The conference message was crystal clear in encouraging us to explore these undefined boundaries of evaluation.
The conference most importantly reassured us, who work day in and day out in these complex situations, that our voice is heard loud and clear that evaluation must ‘value’ and be ‘responsible’ not only to those who provide the funds but to those whose lives are directly and indirectly affected by the programs and projects. Our evaluative decisions will have far reaching effects on their lives and if nothing else, it should at least remind us, in all humility, with the best ethical and rigorous standards we have at our disposal – to learn from them and make sense of what works, what doesn’t and most importantly, understand why. This deep sense of commitment is what I took away from the Evaluation Revisited Conference.
Sonal Zaveri, Community of Evaluators (COE) South Asia
At the Evaluation Re-visited Conference I was blown away by Patricia Rogers’ revelation that the linear cause and effect matrices that have long been taken to describe the theory on which social intervention programmes are based and against which they must be evaluated, are merely a representation of the theory. The matrix is not the theory itself.
As one participant put it “the theory is inside the arrows”. The theory is not self evidently present in the crisp statements contained in the boxes at different levels of objectives within the planning matrix. The theory remains invisible until there has been a social process of working with people to learn together about their reasons for making this choice over that one in the beginning, and this or that adjustment along the way.
It is in the process of understanding together the choices made, that the change theory generating the programme is revealed. And so a rigorous evaluation practice seeks to evaluate initiatives on their own terms. It means trying to understand intentions and not only plans, choices and not only outputs.
At the moment in this field we call social evaluation, there are no visual representations of dynamically unfolding programme design. Neither do current representations include the ‘sense making’ processes along the journey of the programme’s intervention.
To my mind, striving to make a visual diagram that more fully represents an integration of the planning, unfolding and sense making in each programme intervention, would be rigour in practice.
On a slightly less serious note (although I can tell you it was very serious when we were watching Bafana, Bafana, Cameroon and Ghana!), here in South Africa we have had plenty of opportunity to consider the notions of evaluation, practice and rigour while watching the Soccer (apologies if you call it ‘football’) World Cup. Many questions came to me, such as: How could you tell if a team has a rigorous practice or not? How would you go about evaluating whether a team is successful or not? Before agreeing to evaluate a soccer team, would you have to know all the in’s and out’s of the game? Does losing mean failure? Was it always the best team that won? Is there such a thing as ‘soccer best practice’ to be used as a soccer evaluation template? Should each team be evaluated against their original game plan – or how well they adapted to the conditions that they met along the way… and so on!!!
And finally, a last soccer question: ‘What is the role of the ball in soccer?’
And one of my favourite quotes:
“…try to love the questions themselves as if they were locked rooms or books written in a very foreign language. Don’t search for the answers, which could not be given to you now, because you would not be able to live them. And the point is, to live everything. Live the questions now. Perhaps then, someday far in the future, you will gradually, without even noticing it, live your way into the answer.” (Rainer Maria Rilke: letters to a young poet)
Catherine Collingwood, evaluation consultant
I am in the train that takes me to Utrecht. I am checking the World Development latest issue…. “this research shows that beneficiaries have a 16% change to get out of poverty”…. My destination is ICCO….”We want to help reduce poverty through improving the farmer organisation’s financial and economic conditions”…“. I try to grasp the econometric argument, while I feel it would be better to prepare an open mind for ICCO’s particular information needs. I have been working with these farmer organizations for six years in the past. Now, I work on a PhD research. On them. With them. In all their diversity and internal complexity.
I am Torn Between Two Lovers… excited…. over-stretched….
Academics are not really interested in conclusions but primarily in the methodological ways to get to them; most practitioners are interested in answers on relevant questions without bothering too much on the way evidence is collected to answer them. My way to balance these competing claims is, first, checking the information needs, clarifying the evaluative question. And than, for each of type of evaluative conclusion identify the most obvious ‘threats to validity’. The questions a ‘critical insider’ can be expected to pose when he/she wants to reveal the weak points in the evidence or reasoning. That exercise often makes clear that one data collection method is insufficient. Mixed-methods are needed. Checks on critical assumptions in the methodology have to be added to the research design.
The train suddenly stops. No railway station near. As usual, the trains’ schedule proves to be contingent on other trains’ punctuality. Real life is complex. Has always been complex. Nothing new in that. I get my bag and take the Sage Handbook on Case-based Methods……… where the twain shall meet…….. searching methodological rigour embracing complexity.
Giel Ton (researcher, LEI)