GSAM

An ancient monk's solution for confidence

Avoiding the data trap blog series

'Avoiding the Data Trap’ is a 3-part blog series developed by Pamoja to highlight a new approach to impact evaluation, called Contribution Tracing.  The blog series explains key steps in Contribution Tracing that can guide evaluators, and those commissioning evaluations, to avoid common data traps, by identifying and gathering only the strongest data. The blog series draws from a live case study of a Contribution Tracing pilot evaluation of Ghana’s Strengthening Accountability Mechanisms Project  (GSAM) project. This pilot forms part of a learning partnership called the Capturing Complex Change Project, between Pamoja, CARE UK International and CARE Ghana and Bangladesh Country Offices.


Part 3: An ancient monk's solution for confidence

Thomas Bayes, 1702-1761

Thomas Bayes, 1702-1761

Welcome to the final edition of the ‘Avoiding the data trap’ blog series. If you are reading this blog series for the first time, we highly recommend that you start with our previous blogs: part 1, Mining for Data Gold; and part 2, Evaluator Seeking Sensitive Data. Then head on back here for the conclusion!

Let’s recap our journey so far. In part 1, we introduced the idea of the ‘data trap’ - wasting time, energy and resources collecting data that does nothing to increase confidence in a contribution claim that you wish to evaluate. We also introduced the notion that not all data is equal. Some data is very powerful in respect of your claim; while other data is very weak. In the last blog, we showed you how to use probabilities for Sensitivity and Type I Error, to begin the process of identifying the most powerful data with respect to your claim.

In this final blog in the series, we want to show you how an ancient monk, who has been dead for over 250 years, can help us to find data with the highest probative value. In relation to the key data design steps for Contribution Tracing (see Box 1 below), we will cover steps 3 and 4 in this blog.

Box 1.jpg

INTRODUCING BAYES THEOREM

Bayes Theorem (also known as Bayes Rule, or Bayes Formula), was discovered by Thomas Bayes, a monk who died in the mid-18th Century. Bayes Theorem is a law of probability theory that can be used to help us understand and deal with uncertainty. It enables us to update our beliefs when new evidence is discovered. It also helps us to put a number value on our confidence, showing us by exactly how much we should update our confidence when presented with new evidence. So, what does all this really mean?

Have you ever watched a crime drama on TV and been convinced of the innocence of a key suspect only for new evidence to come forward in later scenes that changes your belief to one of guilt? This is an example of Bayes Theorem in action! In everyday life, we each do this in our own minds, without even realising it.

But what’s all this got to do with identifying strong evidence in respect to a claim? Well, rather than talking about beliefs, in Contribution Tracing, we start with a contribution claim. Remember that our claim in part 1 was about a project in Ghana, called the Ghana Strengthening Accountability Mechanisms (GSAM) Project. The Project team claims that: “GSAM’s facilitation of citizen’s oversight on capital projects has improved District Assemblies’ responsiveness to citizen’s concerns.” So, a claim is essentially a hypothesis about the role an intervention, or any parts of it, may have played in bringing about a outcome (i.e. a change!). At the outset of our evaluation, we don’t know if our claim is true or false – that’s what we want to test. Contribution Tracing allows us to do this in a rigorous and efficient way.

From earlier blogs in this series, we understand that certain types of evidence, if found, will increase our confidence in the validity of the claim. In Part 1, we showed you how to qualitatively identify evidence which has the power to strengthen confidence in a claim’s validity; by applying the helpful filters of ‘expect to find’ and ‘love to find’, in our search for evidence. In Part 2, we went a step further by assigning probabilities for Sensitivity and Type I Error, for each item of evidence. However, at this point, we still need to know how much our confidence in the claim will increase, should we find the evidence we seek. That’s where Bayes Theorem comes in, showing us how this ancient monk’s idea actually helps us very much in determining our confidence.

As we begin delving into Bayes Theorem, it is useful to refer to the evidence we identified and the probabilities we assigned from earlier blogs, by reviewing Box 2 below.

Box 2.jpg

From our evidence in Box 2, we now want to understand which items of evidence are the most powerful in validating our claim. In other words, what evidence, if found, would quantifiably increase our confidence in the claim the most? We work this out by plugging in the probabilities for Sensitivity and Type I Error to a version of Bayes Theorem that we find particularly useful in Contribution Tracing.

It is not the aim of this blog series to explain Bayes Theorem in detail - that will be the focus of future blogs. Our objective here is to show you how applying Bayesian logic can support the identification of the strongest evidence with respect to our claim, as part of a Contribution Tracing evaluation.

We make this important caveat before presenting the version of the Bayes Formula we use in Contribution Tracing, as it can appear at first sight, to be immensely complicated. In reality it’s not! But it takes more explaining than we have time for in this blog, so watch this space!

At this stage, we don’t need you to understand how the formula works, but just to have faith that it does. After all that’s what a theorem is - a mathematical statement which, over time, has been proven true.

To make sense of the formula (See Box 3), you need to know that your confidence level before looking for evidence is called the ‘prior’ and, following evidence gathering, the ‘posterior’. For ease, the prior can be set at 0.5. In the language of probability, this means that we are neither confident nor sceptical about the validity of the claim – it’s 50/50 whether it is true or false. With the prior set at 0.5, and the probabilities from Box 2 above, we are now ready to plug everything into the formula.

Box 3.jpg

Let’s use the probabilities we calculated for evidence item 1. In this example of a training agenda, Sensitivity was estimated to be 0.95 and Type I Error, 0.4. Based on these values, should we discover such an item of evidence, it would move our confidence from 0.5 (prior) to 0.7 (posterior) – see Box 4 below. In other words, if we found a training agenda during our evaluation, that met our expectations of what it should contain, it would increase our confidence by 20 percentage points. What we are saying here is that finding such a training agenda would give us 70% confidence that the Project really did deliver its training event.

Box 4.jpg

Applying the same process for each item of evidence in Box 2 gives us the posterior values shown in Box 4 below. As you can see, the most powerful items of evidence are the signed attendance record and the video of the training event. Focusing efforts on finding these most powerful items of evidence first, would be the best strategy to validate this component of the claim. If one or more of these items could not be found, attention would then shift to other items, such as interviews with participants, and the training agenda. This stepped approach to data collection means that we focus effort firstly on evidence with the highest probative value; only gathering lower value evidence, if it is needed. This saves us time (and precious resources!) as we prioritise evidence according to its value. Therefore, in the long run, we’ll spend less time on evidence collection.

The example we have taken from the GSAM Project has been limited to just one component and only a small number of evidence items, as a way to facilitate learning in the context of this blog series. But the real GSAM case had 14 components and 61 items of evidence that were initially identified. In cases where the list of evidence to be gathered is lengthy, you can see why identifying the most powerful data makes sense to save scarce time and resources.

Box 5.jpg

For those less keen on numbers, there is a useful qualitative rubric that can be used to accompany the quantitative values which help describe posterior confidence levels – See Box 6 (Befani and Stedman-Bryce, 2016).

Box 6.jpg

In summary, the key data collection steps found in Contribution Tracing, are an effective way of avoiding the data trap.

  1. The first step, after having clearly articulated your causal mechanism, is to focus on one component at a time and identify the evidence you would ‘expect to find’ and ‘love to find’, if the component of the claim is true. This is a useful activity which generates evidence items in a way which is focused on the specific claim.
  2. Following initial evidence identification, each item of evidence is assigned two probabilities, one for Sensitivity and one for Type I Error. Ideally, we are looking for evidence with high Sensitivity and low Type I Error.
  3. Plug the probabilities for Sensitivity and Type I Error into the Bayes Formula to determine the posterior confidence for each item of evidence.
  4. Focus evidence gathering firstly on evidence with the highest posterior confidence values, for example with values of 0.85 and above. If, having searched for this evidence, it is not found, move to the next level down (0.7 – 0.85). Again, if insufficient evidence is found, move to the next level down (0.5 – 0.7). Continue this process until you find the evidence you need, and leave the rest! The point is that you don’t need to gather all the evidence in your list, just the evidence that validates your claim with the highest level of confidence.

Remember that avoiding the data trap is just one part of the Contribution Tracing approach to impact evaluation. The purpose of this blog series has been to highlight the unique steps Contribution Tracing follows to ensure a targeted evidence-gathering approach. Conducting a Contribution Tracing evaluation has other elements which make it an incredibly rigorous, theory-based approach to impact evaluation.

If you want to learn more, please refer to these useful resources list below. Feel free to shout out to us on Twitter @PamojaUK and ‘like’ our Facebook Page for updates at https://www.facebook.com/pamojaevaluation/   

Mining for Data Gold!

Avoiding the data trap blog series

'Avoiding the Data Trap’ is a 3-part blog series developed by Pamoja to highlight a new approach to impact evaluation, called Contribution Tracing.  The blog series explains key steps in Contribution Tracing that can guide evaluators, and those commissioning evaluations, to avoid common data traps, by identifying and gathering only the strongest data. The blog series draws from a live case study of a Contribution Tracing pilot evaluation of Ghana’s Strengthening Accountability Mechanisms Project  (GSAM) project. This pilot forms part of a learning partnership called the Capturing Complex Change Project, between Pamoja, CARE UK International and CARE Ghana and Bangladesh Country Offices.


Part 1: Mining for Data Gold!

With Monitoring and Evaluation now a standard feature in development projects, NGO staff and evaluation practitioners are charged with the sometimes daunting task of gathering evidence to prove the influence of programming on complex social change. Examples of what NGOs such as CARE are trying to do to tackle poverty and address social injustice are endless. Often, we can see change happening in the communities we serve. However, the process of showing the ‘how’ often results in pages and pages of ‘data’ that yields little reliable evidence. There is frustration that comes with having a strong belief that programming has made a difference for the better, but then failing to capture data that supports a clear cause and effect relationship. We face challenges in claiming with confidence, just how our work actually contributed to positive change. How many of us have been here too many times before?

What is the data trap?

When evaluating a claim made by a project or programme, about the role it may have played in contributing to an observable change, it is crucial to gather evidence that strengthens our confidence in making such claims. All too often when substantiating ‘contribution claims’, strengthening our confidence in the claim is confused with simply collecting an abundance of data. We miss the mark by failing to focus on the relative strength (or weakness) of such data. Wasting time, energy and resources collecting data that does nothing to increase confidence in the claim, is what we like to call a data trap.

Enter Contribution Tracing: a new theory-based impact evaluation approach. It combines the principles and tests found in Process Tracing, with Bayesian Updating. Contribution Tracing helps sort the data wheat from the chaff! Most importantly, it changes the way we look at data, encouraging us to identify and seek out the best quality data with the highest probative power. Contribution Tracing gives us a clear strategy for avoiding the data trap; supporting evaluators instead to mine for data gold.

So how does it work? To illustrate, let’s draw from a live Contribution Tracing evaluation which is part of the Capturing Complex Change learning partnership. Ghana’s Strengthening Accountability Mechanisms (GSAM) is a USAID-funded, multi-year intervention led by CARE with partners IBIS and ISODEC. The ultimate aim of GSAM is to support citizens to demand accountability from their local government officials.

The GSAM evaluation team are currently testing the following claim, using Contribution Tracing:

GSAM’s facilitation of citizen’s oversight on capital projects has improved District Assemblies’ responsiveness to citizen’s concerns.

Essentially this claim is stating that a range of activities provided by - or funded by GSAM - has supported citizens to become more engaged in scrutinising government-funded building projects. As a result of this, District Assemblies (local government) have become more responsive to concerns presented by citizens, related to the quality, performance and/or specification of on-going capital projects in their communities, such as the construction of new schools or roads.

To test this claim, we need to unpack the mechanism that provides a causal explanation for how the project’s range of facilitation activities contributes to the outcome of District Assemblies becoming more responsive to citizens’ concerns.

In Contribution Tracing, causality is thought of as being transmitted along the mechanism, with each interlocking component being a necessary part. A mechanism component is comprised of two essential elements: an entity, such as an individual, community or organisation, for example; that performs an activity or behaviour, or that holds particular knowledge, attitudes or beliefs.

One of the necessary components, identified by the GSAM evaluation team is below:

The GSAM project (entity) delivered training to Civil Society Organisations (activity) to increase their knowledge and skills in engaging with District Assemblies on the planning and implementation processes of capital projects.

In Contribution Tracing, the role of the evaluator is to identify evidence that tests whether each component in the mechanism for a particular claim actually exists, or not. If sufficient empirical evidence can be identified and gathered for each component in a claim’s mechanism, we can update our confidence in the claim, quantitatively.

But wait! Before running off to gather whatever data we can lay our hands on, in Contribution Tracing we take several initial steps to help design our data collection (Box 1). These steps focus our attention on only gathering specific data that supports testing the existence of each component of our claim’s mechanism. Why is this important?

  • It saves a lot of effort in gathering essentially useless data, in respect of our claim;
  • It saves limited resources e.g. staff time, finance, etc;
  • It’s more ethical because we are not asking key informants to spend their precious time providing information that we won’t use; and
  • It produces more rigorous findings.

This blog is focused on step 1, with later blogs in the series describing the other steps.

STEP 1

To begin the data design process in Contribution Tracing, we ask “if the component of the claim is true, what evidence would we expect to find?”. In other words, if the GSAM project really did provide Civil Society Organisations with training, what evidence should be readily available, if we look for it? Some examples of such ‘expect to find’ evidence are shown in Box 2.

The logic behind identifying ‘expect to find’ evidence is simple. If the component of the claim is true - if the project really did deliver its training programme - the evaluator should be able to easily find such evidence. Failure to find ‘expect to find’ evidence, diminishes the evaluator’s confidence in the existence of the component of the claim (and perhaps in the claim overall). ‘Expect to find’ evidence, therefore, becomes powerful only when it is not found.

In addition to expect to find evidence, we must also try and identify ‘love to find’ evidence. This is evidence which is harder to identify and find, but if found, serves to greatly increase our confidence in the component of the claim (and perhaps in the claim overall). We can think of ‘love to find’ evidence as highly unique to the component of the claim. Box 3 shows an example.

While we would love to find video footage of the training event being delivered, it is not an expectation. It is not usual practice to film such events in this context, but if filming did take place, and the evaluation team could gather such evidence; it would confirm the component of the claim. So, while expect to find evidence only becomes powerful when not found, love to find evidence becomes powerful when it is found.

This step in Contribution Tracing helps the evaluation team to begin the process of focusing on identifying data gold, but it is only the first step. In the next blog, we will explore how we use probabilities to be even more targeted in our search for data gold.


Part 2 of the blog series will be published on 31 July 2017. Sign up below and get parts 2 and 3 delivered directly to your inbox.

CONTRIBUTION TRACING VLOG SERIES: Designing Data Collection

Vlog 5 of 5: Designing Data Collection

This is the final Vlog in the Contribution Tracing series. Samuel Boateng explains how Contribution Tracing uses probabilities to focus on collecting data with the highest probative value; making best use of limited resources for impact evaluation.

CONTRIBUTION TRACING VLOG SERIES: Understanding Process Tracing tests

Vlog 4 of 5: Understanding Process Tracing Tests

In the penultimate video of the series, Michael Tettey, provides brief explanations for the four tests that we find in Process Tracing: the Hoop, Smoking Gun, Doubly Decisive, and Straw in the Wind tests.

If you have missed any of the previous vlogs in the series, you can check them out here:

  1. What is Contribution Tracing?
  2. How do you develop a testable contribution claim?
  3. Unpacking your causal mechanism

CONTRIBUTION TRACING VLOG SERIES: Unpacking your causal mechanism

VLOG 3 OF 5: UNPACKING YOUR CAUSAL MECHANISM

This is the third Vlog in the Contribution Tracing series. Don't worry if you missed the other Vlogs, but you might want to watch them first. Check out the first Vlog on 'What is Contribution Tracing?' and the second Vlog on 'Developing a testable contribution claim'. 

In this week's edition, Francisca Agyekum-Boateng, delves into the topic of 'unpacking your causal mechanism'. Francisca explains what a causal mechanism is and how to clearly develop a mechanism, based on a specific claim.