The nuts and bolts of program evaluation

Content type
Event date

9 December 2013, 1:00 pm to 2:00 pm (AEST)


Elly Robinson


You are in an archived section of the AIFS website 



About this webinar

This webinar was held on 9 December 2013.

Impacts, objective and outcomes, logic models, hierarchies of evaluation, ethical issues… the language of evaluation, let alone evaluation itself, can be enough to induce sleep. But as the pressure to show return on investment in programs increases, it has become essential for service providers to understand the basics and learn how to apply them to their own programs and services.

In this webinar, Elly Robinson gives a guaranteed, easy-to-understand "nuts and bolts" overview of evaluation, based on the recently released CFCA evaluation resources, and describes how evaluation links to program implementation and innovation.

Audio transcript (edited)


Hello and thanks for joining us here today, both virtually and in person, to talk about the nuts and bolts of program evaluation. I'd like to acknowledge the traditional custodians of the land on which we meet today, the Wurundjeri people, and pay my respects to their Elders past and present. I also should say that the views in today's presentation are purely my own, and not necessarily those of either the Australian Institute of Family Studies or the Australian Government. I actually come from a practice background, a background in youth services mostly, instead of coming from a research background. So I've spent a lot of time making sense of evaluation concepts just like probably a lot of you today who are attending. So the webinar today will tend to reflect those experiences with evaluation as a practitioner rather than coming from a research perspective. The evaluation resources that we've recently published on the CFCA information exchange website are the basis for what I'll be talking about today and these are the five different publications that we have. So we will be talking a little bit about evaluation and how it links to innovation, around evidence-based practice and service-based evaluation. We'll talk a bit about ethics in evaluation in the specific context, how we prepare for evaluation and how, once the evaluation is done, we disseminate the findings for the use of others and other program and service providers.

So why would we do evaluation? Fundamentally, program or service evaluation differs from the evaluations we make in everyday life because it's systematic, there's a systematic way of checking whether the program or service meets the quality standards that are set for it. We might also evaluate to find out whether the participants were satisfied with the program or service and did they benefit from it. And importantly did they benefit because of the program or were there other factors at play that made it look like they benefited from the program or service. We might also ask whether the program can be improved or refined in some ways. An evaluation maybe used in terms of findings for justifying requests for further support or funding for the program or service. And lastly it maybe around whether the implementation of the program was true to the actual program design itself, so looking at the nuts and bolts of the process of running the program or service.

Moving on to different types of evaluation. There was one study that said there are now 26 different types of evaluation - that's some heavy bedtime reading for those of you who are interested. But fundamentally most of them have a lot in common. There are some more traditional styles of evaluation which are outlined on these slides. Outcome evaluation and impact evaluation are terms often used interchangeably, I've found, depending on what you read. But fundamentally often outcome evaluation looks at the questions around short or medium-term outcomes of a program or service such as does this program help my clients and has there been any unintended consequences or unintended outcomes of this program or service. Impact evaluation tends to look at the longer-term effects of a program so over a long period of time, did the program have the effects that we thought that it would. So have things like parenting changed over time or the relationship changed over a period of time. Again I'll come back to these concepts a little later when we talk about program logic.

As I said those terms are often used interchangeably and you will see sometimes impact evaluation talks about the immediate impacts but more often than not it's a little later. And then there is process evaluation which is linked to implementation of a program, so is the program being implemented in the way it's intended, is it reaching the people for whom it was designed, is it the right target audience for that project. And process evaluation also opens up that black box of complex interventions and again I'll explain this in a few minutes. Other types of evaluation that have become more popular in recent years includes participatory evaluation where program managers and staff within the program are treated as partners alongside the evaluators. And another different type of evaluation that's gained some interest in recent years is empowerment evaluation and this probably takes the next step in saying it's helping people within their communities to learn how to help themselves and determine what services they need within their communities. And of course this approach assumes quite rightly that most communities will have a better knowledge around what happens within their communities and what it needs than the evaluators or professionals that are coming in to help them. So working in partnership and capacity building that communication to figure out what works for them.

There are many things that evaluation might lead to in terms of innovation in project delivery. So innovation in content is talking about applying new approaches to specific topics and issues that are dealt with in a program or in the individual modules of that program. It may involve the application of a new approach or a new theory or a new model or adding new topics in response to social, demographic or other trends. An example of that is if you're introducing a discussion of the use of social networking sites in a relationship education program, so how do these new modes of communication affect or potentially affect a relationship over time. Innovation in delivery method talks more around employing new or adapted modes of delivery through partnerships with stakeholders, creating ownership amongst clients, staff or stakeholders or through the use of learning approaches that make use of new information and communication technologies. A good example of this that's happening in a number of different places at the moment is the provision of web-based family dispute resolution, so using a new technology applied to an old service type. And lastly, innovation in forging new partnerships or networks. It may be a case of sharing an increasing expertise, knowledge and experience, fostering communication and exchange of ideas and this often you will see happen within an external or cross-agency supervision model where the emphasis is on sharing these things.

So as I said implementation or implementation science has become much more prominent in recent years so we're talking about not just the outcomes of a program and what happens but also what actually happened when we were running the program, what was the process like, how effective was the implementation of the program, and were there any things that affected the process of running the program or service. I mentioned black box before and this particularly, so where the boxes are in the middle of this diagram is what is often called the black box. So the same program, we might end up seeing that it has good outcomes for children and we might see the same program run somewhere where it has poor outcomes for children. So implementation will tell us about what happens in the middle in terms of was it the right target group, was the program run in the way that it was supposed to be, did the right people turn up, what stopped them from turning upon the day, those sorts of things that unpack what would affect a program to reach certain outcomes for children or for the client group.

The importance of implementation is shown in this diagram as well, so implementation is the how of what happens, the processes I've been speaking about, and on the left-hand side is the what, so the program. So if we know the program is effective through evidence-based practice or research or evaluation, and the how of running it, so it's actually implemented effectively leads to actual benefits in outcomes for clients. If any other combination of these things happen then it becomes more problematic or potentially more problematic, so if the how is not effective, the program is not run in an effective way, even though it's an effective program the outcomes are likely to be inconsistent, not sustainable or poor outcomes in general. Likewise if it's a program that's not effective, so it hasn't been shown to actually work, but the implementation of it is effective it will still reach poor outcomes in a lot of cases. And lastly when you don't have an effective program and neither do you implement it properly, it's more likely to lead to poor outcomes but also there's the potential for it to be quite harmful as well.

Coming back to the idea of hierarchies of evaluation, so this is that very formal way of looking at evaluation, what would work to show us most effectively whether a program or a service works or not. So the different types of evidence and I'll run through a few of these, allow a stronger or weaker conclusion to be drawn about the effectiveness of a program. And I'll talk about the difficulties of applying some of these models within a service environment and what that means in terms of what we find after I've gone through the different types of evaluation. So randomised control trials mean that the population, the sample population are either allocated to a control or an experimental group. So the control don't get the intervention and the experimental group do. If there's no bias in the way that people are allocated to those programs - so normally some sort of randomised way of allocating people will be used and there's plenty of websites that have randomised numbered allocations within them. So if there's no bias used then it's more reasonable to conclude that the program made a difference. If you have the same types of population within both of those groups, clearly if you have males in the experimental group and all females in the control group then it's likely that we can say that males were more likely to experience the changes but obviously there's bias involved because there were clear gender differences between the two groups.

RCTs or randomised control trials are not without criticism and again I will come back to this when we talk about service provision but they're often seen as conducted in a clinical way that doesn't necessarily apply to a service level. So problems with RCTs in service environments and it's also relevant to some of the other designs that I'll talk about as well but there's issues such as drop out of participants. If we have different groups of participants and say more people drop out of the control group than in the experimental group it will make a difference or can make a difference between the outcomes for both groups. Again, as I discussed before, unexpected differences between groups might happen if there's bias involved, so it may be to do with age or gender or other factors that have made the two groups prior to the trial be different. There's also ethical issues involved in this and this particularly is obviously true for a service environment. If there's ethical issues with withholding treatment from the control group whilst the experimental group gets the treatment, because you may be withholding something that will help the participants who are involved benefit from it.

For those of you who are interested in having a go at doing this, I don't know whether many of you have seen Ben Goldacre who is a UK scientist and doctor and he teamed up with Nesta, which is a organisation in the UK, and they created this site called Randomize Me. It's a very simple and easy way to show people how randomisation works within trials and it's worth going to have a look. On the site they have set up a number of different RCTs that you can actually join in on but you can also create your own trial as well. Some of the questions they're looking at presently are things like, "Do my new runners make me run faster? "Does complimenting the barista at my local coffee shop improve my chances of a free coffee?" And lastly, "Does eating cheese give you nightmares?" The interesting thing that I find about "Does eating cheese give you nightmares" is many of the researchers have jumped onboard and of course said well it depends how many grams of cheese you have and what type of cheese it is that you're eating and what time of night it is that you're eating the cheese. So even though they've had a go at trying to make it simple I think many of the people who know about these things are quite willing to make it a little more complex. But it's fun to go and have a look at and have a go and you can actually join in the trials and be part of it.

Next up are quasi experiments, so that's where you might use a naturally occurring comparison group rather than a randomised control trial. It may be that your comparison to your experimental group are participants on a waiting list or it might be a group of people who are offered a different intervention, so it maybe a briefer version of the program that the experimental group are engaged in. There's greater benefits to those in the program, which might mean that it's effective but because they're not randomised groups then you can't necessarily say that it caused the changes. So having a comparison group is good in terms of getting a better idea whether it's working or not but it's not quite as good as a randomised control trial because of that lack of randomisation to the groups, so that introduces the bias that I was speaking about before. But certainly used in many different ways and many different evaluations to introduce some benefits in terms of using a group that doesn't get the experimental input.

And lastly pre- and post-tests. So in these, which I'm sure many people are familiar with because they get used very regularly within program and service delivery, there's no comparison or control group used but there's measures before and after the program. It might be a sheet that you use to get some idea around whether participants enjoyed the program or how they felt or what they learnt throughout the program. Strictly speaking no real conclusions can be drawn here because the changes might have happened anyway and this is another source of bias that often comes into these sorts of evaluations. You might find that whilst they were doing the project the participants were also reading books around the program or the issue that was at hand in the program, and that might be the same with comparison groups or even control groups as well. So it's hard to draw real conclusions, I guess the question is whether they're better than nothing and no matter who I ask there are different opinions about this. It might be an interesting start to the forum this afternoon after the webinar is to have a look at whether people do think are pre- and post-tests better than nothing or should they not be done at all. I will leave you to chat amongst yourselves about that and then post comments later.

It's a very complicated decision tree on this slide, I won't go through it, I just wanted to give you some idea around the resources that you can find within the evaluation resources that I mentioned before on the CFCA website. Basically it's saying who should be involved in conducting the evaluations, so that's a key question. Do you have the expertise within your organisation to find out if the program works? If you do, if you have enough expertise, then you maybe able to do the evaluation entirely in-house. If not the evaluation may be done solely by an external evaluation professional. If you have some experience in doing this it may be that you do some of it in-house and you do some of it in collaboration and the pros and cons are outlined there. So if people are grappling - often organisations will be in a situation where they have in-house expertise or they have some in-house expertise or they feel that it's better to partner with a university or other organisations or partner with CFCA information exchange to do an evaluation of their program. So this just outlines the pros and cons of each of those approaches and as I said they can be found in the evaluation resource that we have online.

Part of the importance of getting evaluation into the organisation and it being valued is developing a culture of evaluation there. So the benefits of developing this culture include that it helps to reinforce reflective thinking, it's not just day after day of the same practice and the same ways of doing things. What we're actually saying with an organisation is we need to think about whether that actually works or not, and we want to make sure that it doesn't do more harm than good. It helps staff to be responsive to the external demands for accountability that come about, being able to say to funding bodies and other service providers that there's some evidence that what we are doing works. The organisation will increase its confidence in terms of whether there's evidence of a positive impact of their services and programs, so it gains some relevance in terms of being able to build confidence and build capacity within the organisation to know that what people are doing does work. And of course another benefit is that staff gain new skills and knowledge from being involved in these things.

We might find challenging attitudes within organisations when we do try to build a culture in there and these might be some of the things that people recognize as responses to it. Program staff say but the program is exemplary, we don't need to evaluate it, it's the best thing that we've ever done in the entire organisation ever. So it's important to build realistic expectations of the level of improvement that's reasonable to expect within a program. Getting back to the idea before that even if you put a program or a service in place, it doesn't mean that there's going to be large amounts of change because other people will also be learning new things. So the amount of improvement that you find after implementing a program or a service is not necessarily going to be big, it doesn't mean it's not good, but we need to set realistic expectations about what's going to be found. But we'll offend program staff by evaluating things - so it's important to remind people that an evaluation is about the program or service itself, not about the personnel who are involved in it. It's a discussion about improvement rather than judgment around the way that that program or services is going.

Worries that the program might be terminated if negative results are found. I guess the message to give to people is that if negative results are found or if the outcomes aren't as positive as we'd like to see, the funding body is going to be under more pressure to find a replacement for that and that's not likely to happen because it's much easier for funders and for people to look at the ways that an existing program might be improved and what sorts of things can be put in place rather than having to dig up an entire new project. And one that you might hear quite often is around how evaluation drains program resources, which might be true in the short term, but the alternative is that money might be spent on programs and services that are ineffective or at worst harmful. So evaluation is more likely to lead to additional support and resources once we know that it's effective so it's again a good sell to the funding body to be able to say we've tested this out, we know that it works and therefore this is what we need to improve the project or we need some more funding to do this. So it may lead to better things.

Moving onto ethical considerations, which is obviously a huge part of both research and evaluation. So looking at what we need to think of in terms of ethics around the data that you're collecting or using and what you need to do. First of all it's important to assess what approvals are required, do we actually need to get ethical approval to do this evaluation and if we do who do we need to get it from. So whether you have an internal ethics committee or you need to go to an external ethics committee or a university. The good part about partnering with an organisation or a university is often that they will come with an ethics committee if you don't have one within your own organisation. It's also important to consider how you ensure that participant information is kept confidential and how you might manage potential conflicts of interest within this. So having people with a vested interest who are involved in the evaluation clearly can be problematic.

When don't you need ethics when you're doing an evaluation? Part of when you don't need it is if you're doing an original analysis of previously collected or publicly available data. So the data has already been collected, it's de-identified, and the OK is there to do an analysis of it. Within program evaluation and service evaluation if it's deemed quality assurance you often don't need ethics. So if it doesn't impose a risk on participants, if it uses existing organisational data, if the analysis of the data is done in-house by someone who's bound by a professional code of ethics. If it doesn't infringe the rights and reputations of carers or the providers or institution and it doesn't violate the confidentiality of clients. So there are ways of seeing that if it - all it is making sure that what you're delivering and doing it in systematic way to test out whether what you're doing is working or not and it uses existing data, then it won't normally need ethics. If you're unsure we're happy for you to contact us and that's our details as well or just come through the CFCA home page and contact us through the contact box up the top and we'll be able to help you out to figure out whether you do or you don't.

Program logic, the words that strike fear into many people's hearts, or theory of change or logic model or program theory and part of this problem is that it has so many different names. Basically it all means the same thing, the different terms have been used at different points in time and done by different people. I just like this diagram, it doesn't really have much to do with program logic but it is around airport logic, which says that if you have three small bottles of liquid that's perfectly safe, but if you have one large bottle of liquid that's super dangerous. So it's the logic behind airports and I thought that might illustrate sometimes the illogic of logic. Program logic is a visual representation of what's going on in a program and I think it's kind of been given a very bad name, I mean basically what we're doing in building a program logic is looking at what you're doing, what you're bringing to the project, what the activities of the project are, who it's aimed at and what the outcomes of the program are likely to be. It's not set in stone, as this says it's an intention, it's a roadmap for the program so it can and will often change over time. So it gives you some idea about where you're going, if you're going to do a needs analysis and find out what people want, you then need to be able to take the next step of saying well what would it look like to meet these needs, what would the outcomes be and how would we judge that our program is working.

So the roadmap for the program is an important thing and so is the relationship so there should be logical links between each state of the program logic model and I'll explain this as I go through. So it's an if-then relationship, if we do this thing then this will happen, if we do that thing there then this will happen, this will make a bit more sense as we work through them. This again is a bit complicated but it's I guess a template for a program and logic model and again you will see we're doing a needs analysis on the left-hand side and things we're considering, we then have what do we invest in it, we have staff, we have volunteers, we have time and money et cetera. The outputs are what we do and this is often where evaluation stops, looking at what the activities are, how many things did we run, how many people came to the project, just numbers. So the program logic tries to take it that next step in saying but what do we expect to find if we do this. And then lastly you've got the outcomes or impact evaluation. So what are the short term aims, what are the medium term aims or outcomes and what's the ultimate impact, what do we really want to see and it's OK to be grand about it, it's OK to have things that are things that you won't necessarily be able to solve all by yourself and partly this is because as you will see down the bottom in the blue box there's always going to be external factors that will impact on outcomes. But it's OK to be able to say we expect that by doing this program then families in our area will be much better off or parents in our area will have better skills or whatever the outcome of your project will be.

Often it seems more a bit like this though and I'll leave that for you to consider for a while because this is what I've often felt like when I'm doing a program logic model. So down the left-hand side we start thinking about creating a really confusing chart and then we go onto realising that it looks important if it's really, really confusing. And engage a graphics generator. You even employ a chimp at one stage and you do as you will see in the white yellow boxes you check with clients, you engage first, yes you roll down the hill and jumpstart the engine and realise that you don't understand any of this. And then right at the end of course the impact of it is that the world is a much better place because they actually paid you to do this. That's what it feels like quite often.

Really it probably looks a little bit more like this and I don't love this model but I couldn't find a better one to show you today. The reason why I don't love it even though it's nice and simple, is it doesn't really have those good if-then relationships built in. And I'll show you why this is a good example of something that probably doesn't quite work so well as it could. So the outputs are developing the curriculum, delivering a series of interactive sessions and facilitating support groups and then targeted parents attend. You can see there's an actual assumption in there that people are going to attend the program. It needs a little bit more work, even though there are assumptions in there and that's OK, we might need to think a little bit more about if we deliver the curriculum and the interactive sessions then targeted parents will attend - the if-then relationship doesn't quite work. But nevertheless it's a good simple diagram of what a program logic might look like. And you can see at the end on the right-hand side that there are some quite aspirational goals there, it's around improving child parent relationships and strong families will result. And of course we know that there's a whole lot of external factors that will impact on whether families are strong or not that are often out of our control, it maybe as simple as taxation or income or employment or lots of other things that will impact, but again it's OK to have some aspirational goals in there.

So how can we do the evaluation, how do we approach this and there's a number of different ways and I'll run through some of these as we go. One of the ways is collecting new data from key informants and again I'll outline this a bit more in a minute. Making use of internal administrative data, including existing program data, and what you collect. It might be the information that you get off your intake forms as people enter the service. So you have an existing collection of data that you can use then for your evaluation, but there are some pitfalls in that. You can use external administrative data, which again has some good points and bad points about it that I'll come to. Use of existing representative research data set, so an example of that is the Longitudinal Study of Australian Children that is conducted here at the Institute. You can also use multiple data sources and the good thing about that is it allows for greater validity and I'll come to validity in a minute, and greater depth of what you're doing. But there are lots of considerations and we'll just unpack those a little bit now.

We need to balance the quantity of data with the quality of it and the ability to analyse. So what resources do you have to do the analysis, do you have the people there, do you have the time, and do you have the money to be able to do it? And that will influence what you collect and what you use. As we talked about before, who is conducting the evaluation and what are the skill sets that are available to the people who are available to you to do the evaluation. How much time do you have, is this a one off or an ongoing process and do you need it all and I think sometimes myself included you fall into the trap of thinking the more data I get the better that the evaluation will be. And so you go off and you conduct half a dozen focus groups at one and a half hours each and then that takes about 30,000 hours to go through and actually analyse the data that's there and you run out of time. And there's an ethical consideration in that because if you ask people for information and you don't use it then that's kind of problematic as well. So you need to have a really good think about what's at your disposal to be able to conduct this evaluation and is that a reasonable expectation that that will happen.

In collecting new data there's a number of different ways and they all have pros and cons as well and some of those pros and cons are talked about in the evaluation resources on CFCA. Qualitative data, so interviews, observations, focus groups, talking to people and getting a deeper level of meaning. Good for measuring behaviour change, you get a greater meaning of concepts and looking for explanations and clearly this can work in tandem with quantitative data collection as well in what's called mixed methods. Observations, so that might be where you actually watch what's going on with people within your program. So it maybe looking at the ways that you've taught people or that you've run a program where you're talking to people about good parenting practice. The good thing about observation is it doesn't rely on self report, you're actually observing what happens yourself but of course this is very resource intensive and you end up with small units of data because you're spending many hours with one, two, three clients.

Interviews, one on one interviews. More costly and time intensive than focus groups but you're likely to get a great - or often likely to get a greater honesty and depth because you're talking one on one with the person around what's going on. Focus groups, the interaction can give you more than what interviews do so it may be that your group runs itself to some extent in being able to talk about what's going on and feeding off other people's responses and it might lead to high quality responses. But what you also get that you don't in interviews is that group dynamics are often in play. So you need to set very clear group rules and have backup in terms of how to deal with things. As I said analysis takes time and resources with qualitative data so you need to be sure that what you're collecting you can use. Quantitative data, so surveys and questionnaires and those of you who have just noticed I've put qualitative before quantitative that shows what my bias is but many people are into quantitative research and so we're talking about surveys and questionnaires here. And the good part about it is you can collect a lot of information in a short time and there's lots of tools on the Internet to be able to do this now so it maybe SurveyMonkey or LimeSurvey, those sorts of tools that can build a survey that you can then put out for three or four weeks, gather lots of information and a lot of those will also analyse the information to an extent for you as well.

The difficulty is unless you're using an established instrument it might take much preparation and review and revision of what you're doing. So an established instrument, maybe a survey that someone has built before that you can use as it is or adapt in some way, it has internal validity in the way that it's been built so there's some benefits but it may not quite capture what you need to get as well. So it might be that you need to build your own instrument and that takes lots of review, revision. Again a lot of this is covered in the evaluation resources about what needs to happen. You also need cover letters, informed consent you do for qualitative as well around letting people know very clearly what's happening and how it's happening and getting their consent to be involved, so there's ethical considerations. And of course there are cultural implications as well so if you're using an established tool or even if you're building your own tool, there's a process to go through around making sure that the people who are involved in the evaluation will understand what's being asked.

Again quite complex, I won't go through this but this is in the resources as well. This is about - is there an instrument that exists that you can already use and no or yes are the two answers. So if you go down the no side, do you have the expertise to create one, if you don't then you need to work with someone who does and write it, test it, review it, revise it, test it, test it, test it. So you might do some pilot testing with people. Same as if you have the expertise then that's good practice as well, to make sure that people understand what's being asked of them and that you're getting the right answers for it. If it does exist does it need to be adapted in some way, yes or no, and if it doesn't you need to consider copyright and all the commercial licensing and often there's a cost involved in it as well. But at least you get something that often has been tested that it's measuring what it's supposed to. And if it needs to be adapted in what way, looking at the ways you've done it and again it's about testing it and testing it again to make sure that it's capturing what you would like it to.

Personally speaking when people talk about validity and reliability I fall asleep. No matter how many times you tell me what this means I won't remember but I will try my best to tell you what it means today. Validity is around whether the tool measures what it's supposed to, so it's a bit about what I've been talking about. Like we know that if a pre-post test says I thought the presenter was really nice, doesn't necessarily mean that it's going to lead to behaviour change, so the actual question doesn't necessarily measure what you want it to. Reliability is about the fact that if it's used repeatedly under the same conditions the tool gives the same results. So we know that it's measuring the same things. Important concepts in using different tools to measure things and you can read up more about that on Wikipedia or probably something more reliable than that.

Internal administrative data you can use to do evaluation as well including program data. It's useful because this is collected anyway often, you can have a picture across programs but often the data is collected for very different purposes to research and evaluation and you hear this a lot when services are saying, "Oh yeah we collect this data, we can just use that" but it's not necessarily collected for the purpose that you want it to. So it might be hard to translate what's being asked across to a research or evaluation context. There are often issues with consistency of data being recorded especially if you're doing it across programs or across services that people may not be collecting things in the same way. There's accessibility issues so there may be privacy or confidentiality issues or issues in physically obtaining the data. So the data may be collected for a different purpose than what you would like to use it for so there are issues in terms of using it in that way. But ideally what you would do is plan ahead and establish those data collections and privacy processes and it would be supportive both of what you need to collect in terms of program administration and evaluation of the program.

External administrative data, so this can be used at both community and individual client levels. So it may be collected by a different government department or somewhere else that you can use to understand outcomes and explore those outcomes and monitor change across a project. Can be a link to client outcomes within your project but that depends very much on privacy issues and whether that data linkage is available. Plenty of limitations but nevertheless again this is collected data that might be useful to you. Privacy and consent issues are important, there's often lots of processes to go through to access and that can take up a lot of time if you're in a hurry and can be in quite complex formats that you may not have the statistical skills to use. Again that might mean more resources to employ someone that can help you out.

Getting to the end here but one of the things that often comes up is whether children and young people should be involved in evaluation. Clearly they have important perspectives to offer but there are a lot of ethical considerations around inclusion of children and young people and it's that balance between inclusion versus additional harm or harm in being involved. The National Statement on Ethical Conduct in Human Research, which is the key document that's used in terms of ethics, has a particular section that looks at involvement of children and young people. I think it's a particularly interesting issue when you think about that grey area of when young people become OK to be able to judge and make their own decisions about being involved in things. So there's a very grey area there as young people grow up but as I say the special section will give some ideas around what needs to be done. We have done research here at AIFS that has sought views of children and young people around some sensitive issues and there's some of the examples so I guess that's just to illustrate that it is possible, it just needs some more thought than normal.

And lastly dissemination. So once the evaluation is done it's really valuable to be able to then disseminate that in a way that tells other people what's going on and interestingly I think that one of the things that happens is we rarely talk about what doesn't work as thought that's much less important than what we know does work, even though similar lessons are likely to be learned from those things. So I guess that's about changing a culture within evaluation but there's plenty of good ways that we can get messages out about those things and it might be that your evaluation report fundamentally is going to your funding body or management of the organisation. But there's also other ways that if the opportunity arises that you can go for publication in journals and there's some journals that specifically are focused on practice issues such as developing practice, internal and external newsletters, research blogs or other blogs, so things like The Conversation or other blogs that are more specific to your particular area of research or evaluation. Social policy blogs or some places have their own, your own organisational blogs as well, which you maybe able to disseminate on.

Of course there's other ways as well, in person at conferences, seminars and network meetings, so for example the Australian Institute of Family Studies conference is coming up next year and abstracts are currently open and there's lots of other examples such as the FRSA conferences and other examples where people can present and share the information with other people doing work in this area. Clearly the more that evaluations are disseminated the more the general sector knows what works and what doesn't work and that benefits clients as a whole. There's also practice profile examples so both of these are overseas but there's some Australian ones as well so Promising Practice Networks and Blueprints for Healthy Youth Development both in the US but they specifically look at disseminating information about practice and what works. So they both have databases that you can search for things.

OK lastly I just wanted to talk a little bit about where to from here for people. So I'm not sure whether anyone else has read the work of Atul Gawande who is a US surgeon and he's one of my favourite authors of all time. He is a surgeon that can actually speak plain language and he writes beautiful books around what he's seen and what he's observed and how it works. So I just wanted to leave people with something that I think he comes up with that you really can do. He talks about people becoming a positive deviant in order to make a worthy difference. So actually doing something that can make a difference within what your work does every day. One of the things he talks about is counting something, and as a resident surgeon he counted how often surgical patients ended up with an instrument or sponge forgotten inside them, which is not something I recommend to you, but nevertheless it was clearly OK for him. He found that although rare, such mishaps were more likely to occur in patients undergoing emergency operations or procedures that revealed the unexpected such as if a surgeon was working on someone who had anticipated appendicitis and they found cancer. So these were situations where they were more likely to lose things.

The upshot of his inquiry was that punishing people for those failures was less likely to work and it needed a technological solution and so they built a system that was able to track the tools because it often got into the hundreds. His point was, if you count something you find something interesting and you will learn something interesting. So it doesn't matter what but it should be of interest to you and it might be an extra question on an intake form and there might be many different things that you would like to know that you can count. It might be something as simple as how many children families have that come from a children contact service and whether there's any patterns within that or other examples that I'm sure you'd be better at thinking of than me.

His second one was about write or read something but I'm particularly interested in the write part because I think again this gets back to disseminating things and we have a lovely example of this on our website at CFCA Connect. It's a very long URL but if you search on Govspace, the one just below you will find this and it was written by a practitioner at Interrelate Family Centres, her reflections on working with people and grief within a post-separation setting. So Gawande again says it makes no difference if it's five paragraphs for a blog, a paper for a professional journal or a poem, just write something. It doesn't need to be perfect but it adds a small observation about your world and what you do. And he says you shouldn't underestimate the effect of your contribution however modest because fragments of information from many different people often become a whole in understanding something that's going on and to answer key questions. So one of the things we do on CFCA Connect, which is part of the information exchange site, is to publish writing from people who have explored some of these issues and again Jackie Dee's article is a great example of that so if people are interested please contact us and we'd be happy to facilitate and help with the writing experience of doing that. But getting something down was his point.

And lastly, he talks about being an early adopter and I think we lose a lot of this when we talk so strictly about evaluation and program evaluation and those sorts of things and that we need to measure things. But if you look at medicine it remains replete with uncertainties, failures that have been put in place and have made medicine better over time. So again I think being an early adopter of change and trying different things and watching what happens is a really important thing not to lose amongst all the talk about evaluation and implementation and all those sorts of things. That's my practitioner coming out again so very keen to make sure that that continues to happen.

So just to summarise now, as we said this has been part - the release of our evaluation and resources is part of the focus on evaluation month that we ran over November 2013 so we have a whole bunch of things on our site if you go to that URL and have a look. We ran a webinar by Howard Bath, which was in combination with the new Knowledge Circle, which is also part of an AIFS project here. And he talked a lot about research within the Northern Territory. We have a bunch of short articles which again are really good examples of the ways that people have written down what's been happening in evaluation so well worth having a look at. And after today's webinar my wonderful colleague Ken Knight will be sending you some information around the forum that will happen on CFCA Connect where you can ask questions that haven't been answered today, make comments about what's been presented today and we're happy to interact with you to have a further conversation for as long as you so please. So thank you for listening to me today, good luck with your evaluations and research and please keep in touch. Thank you very much.



The transcript is provided for information purposes only and is provided on the basis that all persons accessing the transcript undertake responsibility for assessing the relevance and accuracy of its content. Before using the material contained in the transcript, the permission of the relevant presenter should be obtained.

The Commonwealth of Australia, represented by the Australian Institute of Family Studies (AIFS), is not responsible for, and makes no representations in relation to, the accuracy of this transcript. AIFS does not accept any liability to any person for the content (or the use of such content) included in the transcript. The transcript may include or summarise views, standards or recommendations of third parties. The inclusion of such material is not an endorsement by AIFS of that material; nor does it indicate a commitment by AIFS to any particular course of action.

Slide outline

  1. The nuts and bolts of program evaluation
    • Elly Robinson
    • Manager, CFCA information exchange
    • CFCA information exchange webinar, 9 December 2013
  2. Evaluation resources for family support
    • Evaluation and innovation
    • Evidence-based practice & service-based evaluation
    • Ethics in evaluation
    • Preparing for evaluation
    • Dissemination of findings
  3. Why evaluation?
    • Quality assurance systematic checking of program/service meeting standards
    • Were participants satisfied? Did they benefit?
    • Did they benefit because of the program?
    • How can the program be improved/refined?
    • Justify requests for further support/funding
    • Was implementation true to program design?
  4. Types of evaluation
    • Traditional evaluation
      • Outcome evaluation
        • Does my program help my clients?
        • Has there been any unintended outcomes?
      • Impact evaluation
        • Longer term effects of a program (though often used interchangeably with outcome evaluation)
      • Process evaluation
        • Is the program being implemented in the way it is intended?
        • Is it reaching the people for whom it is designed?
        • Opens up the black box of complex interventions
  5. Types of evaluation
    • Participatory evaluation
      • Program managers and staff treated as partners with evaluators
    • Empowerment evaluation
      • Seeks to help people in communities to learn how to help themselves and to determine what services the community needs.
      • Assumes community has better knowledge than evaluators/professionals
  6. Evaluation may lead to change 
    • Innovation in content
      • Example introducing discussion of use of social networking sites in relationship education program.
    • Innovation in delivery method
      • Example provision of web-based family dispute resolution
    • Innovation in forging new partnerships/networks
      • Example external or cross-agency supervision
  7. Implementation, not just outcomes
    • Increasing need to identify:
      • How effective was the implementation of the program?
      • Mediators did anything affect process?
      • Black box may reflect effective/weak implementation
    • Flowchart
      • What is it about a parent education program that leads to good outcomes for children?
      • What is it about a parenting education program that leads to poor outcomes for children?
  8. Importance of implementation
 Implementation - the "how"
  EffectiveNot effective
Implementation - the "what"EffectiveActual benefitsInconsistent; not sustainable; poor outcomes
Not effectivePoor outcomesPoor outcomes, can be harmful
  • Institute of Medicine, 2000; 2001; 2009; New Freedom Commission on Mental Health, 2003; National Commission on Excellence in Education,1983; Department of Health and Human Services, 1999
  1. Hierarchies of evaluation - RCTs
    • Different types of evidence allow stronger or weaker conclusions to be drawn.
    • Randomised-controlled trials
      • Randomised allocation to a control or experimental group
      • If no bias in allocation, more reasonable to conclude that program made a difference
      • Not without criticism, especially when applied to service provision
  2. Hierarchies of evaluation - RCTs
    • Problems with RCTs in service environments (also relevant to other designs)
      • Drop out of participants, especially if at different rates from the two groups
      • Unexpected differences between groups
      • Ethical issues re: withholding treatment from control group
    • Randomise Me:
  3. Hierarchies of evaluation quasi-experiments
    • Use of naturally occurring comparison groups
      • Participants on a waiting list
      • Offer a different intervention, e.g. a briefer version of program.
    • Greater benefits to those in program may mean it is effective but because not randomised, you can t say it caused changes.
  4. Hierarchies of evaluation pre- and post-test
    • No comparison or control group
    • Measures before and after program changes
    • No real conclusions can be drawn changes might have happened anyway
    • Better than nothing?
  5. Service-based program evaluation
    • Criticisms of evidence-based practice
      • Removed from complex real world of service delivery
      • Reliant on very restrictive definitions of evidence
      • Evidence may come from multiple small-scale, less sophisticated designs
      • Evidence should not be discounted just because it doesn t fit into hierarchy
  6. Figure 1. Decision tree: Who should conduct the evaluation?
  7. Developing a culture of evaluation
    • Benefits
      • Helps to reinforce reflective thinking
      • Helps staff to be responsive to external demands for accountability
      • Increased confidence in having evidence of a positive impact
      • Staff gain new skills and knowledge
  8. Challenging attitudes
    • But the program is exemplary!
      • Have realistic expectations of the level of improvement that is reasonable to expect.
    • But we ll offend program staff !
      • Remind them it is a program, not personnel evaluation. Focus discussions around improvement .
  9. Challenging attitudes
    • But the program will be terminated !
      • Funders more likely to be pressured to find a replacement thus if result is unfavourable, more likely to lead to improvements
    • But evaluation drains program resources .!
      • May be true but the alternative is that money may be spent on ineffective programs. Evaluation more likely to lead to additional support/resources
  10. Ethical considerations
    • Ethical considerations around the data you are collecting/using are needed
      • Assess what approvals are required by organisations, ethics committees and service users and staff.
      • How do you ensure participants information is kept confidential?
      • Processes to manage potential conflicts of interest?
  11. No ethics required?
    • If original analysis of previously collected, publicly available data
    • If deemed quality assurance
      • Do not impose risk on participants
      • Use existing organisational data
      • Analysis done in-house by someone bound by professional code of ethics
      • Do not infringe rights/reputation of carers, providers or institution
      • Do not violate confidentiality of clients
    • Unsure? Contact CFCA helpdesk
  12. Program logic
    • Or theory of change
    • Or logic model
    • Or program theory
    • Useful?
  13. Program logic
    • Visually represents what is going on in a program
    • Two important things
      • Relationships logical links between each stage of program logic model (if then )
      • Intention a roadmap for the program
  14. Program logic.
    • Program Action - Logic Model flowchart.
      • Source:
  15. Logical Model for Creating Achievable and Sustainable Change Modalities over Time
    • Source: (
  16. Program logic
    • Situation: During a county needs assessment, majority of parents reported that they were having difficulty parenting and felt stressed as a result
  17. Evaluation approaches 1
    • Collect new data from key informants
    • Make use of internal administrative data including program data
    • Use of external administrative data
    • Use of existing representative datasets (e.g. Longitudinal Study of Australian Children)
    • Multiple data sources allow for greater validity and also greater depth
  18. Evaluation approaches 2
    • BUT need to balance quantity of data with quality and ability to analyse
      • What resources do you have?
      • Who is conducting the evaluation? What are the skill sets available to you?
      • How much time do you have?
      • Is this a one off or an ongoing process?
  19. Collect new data
    • Qualitative interviews, observations, focus groups
      • Good for measuring behaviour change, greater meaning of concepts, looking for explanations
      • Observations e.g. improving parenting practices doesn t rely on self-report, but very resource intensive
      • Interviews more costly and time-intensive than focus groups, but may get greater honesty and depth
      • Focus groups interaction can lead to high-quality responses, but group dynamics in play.
      • Analysis takes time/resources
  20. Collect new data
    • Quantitative surveys, questionnaires
      • Can collect a lot of information in a short time
      • Unless using an established instrument, may take much preparation, review and revision
      • Will also need cover letters, informed consent etc, etc
      • Cultural implications check questions will not be misunderstood
  21. Figure 1. Hierarchy of evaluation designs and data collection methods
  22. Validity and reliability (oh no .!)
    • Validity does the tool measure what it is supposed to, e.g. I thought the presenter was really nice is not equal to behaviour change.
    • Reliability if used repeatedly under the same conditions, the tool gives the same result.
  23. Internal administrative data including program data
    • Useful as can have a picture across programs
    • BUT likely data is collected for very different purposes to research and evaluation - may be hard to translate across to research context
    • Data quality consistency of data being recorded
    • Accessibility privacy and confidentiality issues, physically obtaining data
    • Ideally plan ahead and establish data collections and privacy processes that support both program administration and evaluation
  24. External administrative data
    • Can be used at both community and individual client levels to
      • Understand outcomes
      • Explore outcomes and monitor change
    • Can be linked to client outcomes (depending on privacy issues)
    • Limitations
      • Privacy and consent issues
      • Can be time intensive to access
      • Can be in complex formats that require high level statistical skills to use
  25. Involvement of children and young people
    • Have important perspectives to offer
    • Balance between ethics of inclusion versus additional harm
    • National Statement on Ethical Conduct in Human Research has special section
    • Examples of AIFS research that has sought view of children and young people around sensitive issues
      • Evaluation of 2006 Family Law Reforms
      • Independent Children s Lawyer project
  26. Dissemination
    • Evaluation reports to funding bodies, management
    • Publication in journals, newsletters, research blogs, social policy blogs, your own or organisation blog
    • In person at conferences, seminars, network meetings
    • Practice profiles, e.g.
  27. How do I really matter?
    • Becoming a positive deviant to make a difference
      1. Count something
        • Doesn't matter what, but should be of interest to you
      2. Write (or read) something
        • Guidelines
      3. Change - be an early adopter
        • As successful as medicine is, it remains replete with uncertainties and failure .
    • Gawande, A. (2008). Better: A surgeon's notes on performance. London: Profile Books
  28. Summary
    • Evaluation resources released as part of Focus on Evaluation month in November
    • Also webinar by Howard Bath, short articles, etc.
    • Forum on CFCA Connect ask questions, make comments
    • Thanks and good luck!


Elly was formerly the manager of the Child Family Community Australia (CFCA) information exchange, which is the product of the amalgamation of three previous AIFS clearinghouses (National Child Protection Clearinghouse, Australian Family Relationships Clearinghouse, Community and Family Clearinghouse Australia). Elly has extensive experience in the writing, development and production of publications, learning materials and resources for practitioners, service providers, students and the broader community. She has authored a number of publications, submission and journal articles and played a primary role in the authorship of two Specialist Practice Guides for the Department of Human Services (VIC). Elly is currently undertaking her Masters in Public Health at the University of Melbourne, and her research interests include young people and their families, mental health, global health and the impact/use of digital communications in families and relationships.