### Avoiding the data trap blog series

*'Avoiding the Data Trap’ is a 3-part blog series developed by Pamoja to highlight a new approach to impact evaluation, called Contribution Tracing. The blog series explains key steps in Contribution Tracing that can guide evaluators, and those commissioning evaluations, to avoid common data traps, by identifying and gathering only the strongest data. The blog series draws from a live case study of a Contribution Tracing pilot evaluation of Ghana’s Strengthening Accountability Mechanisms Project (GSAM) project. This pilot forms part of a learning partnership called the Capturing Complex Change Project, between Pamoja, CARE UK International and CARE Ghana and Bangladesh Country Offices.*

### Part 3: An ancient monk's solution for confidence

Welcome to the final edition of the ‘Avoiding the data trap’ blog series. If you are reading this blog series for the first time, we highly recommend that you start with our previous blogs: part 1, Mining for Data Gold; and part 2, Evaluator Seeking Sensitive Data. Then head on back here for the conclusion!

Let’s recap our journey so far. In part 1, we introduced the idea of the **‘data trap’** - wasting time, energy and resources collecting data that does nothing to increase confidence in a contribution claim that you wish to evaluate. We also introduced the notion that **not all data is equal.** Some data is very powerful in respect of your claim; while other data is very weak. In the last blog, we showed you how to use probabilities for Sensitivity and Type I Error, to begin the process of identifying the most powerful data with respect to your claim.

In this final blog in the series, we want to show you how an ancient monk, who has been dead for over 250 years, can help us to find data with the highest probative value. In relation to the key data design steps for Contribution Tracing (see Box 1 below), we will cover steps 3 and 4 in this blog.

**INTRODUCING BAYES THEOREM**

**Bayes Theorem** (also known as Bayes Rule, or Bayes Formula), was discovered by Thomas Bayes, a monk who died in the mid-18th Century. Bayes Theorem is a law of probability theory that can be used to help us understand and deal with uncertainty. It enables us to update our beliefs when new evidence is discovered. It also helps us to put a number value on our confidence, showing us by exactly how much we should update our confidence when presented with new evidence. So, what does all this really mean?

Have you ever watched a crime drama on TV and been convinced of the innocence of a key suspect only for new evidence to come forward in later scenes that changes your belief to one of guilt? This is an example of Bayes Theorem in action! In everyday life, we each do this in our own minds, without even realising it.

But what’s all this got to do with identifying strong evidence in respect to a claim? Well, rather than talking about beliefs, in Contribution Tracing, we start with a contribution claim. Remember that our claim in part 1 was about a project in Ghana, called the Ghana Strengthening Accountability Mechanisms (GSAM) Project. The Project team claims that:* “GSAM’s facilitation of citizen’s oversight on capital projects has improved District Assemblies’ responsiveness to citizen’s concerns*.” So, a claim is essentially a **hypothesis** **about the role an intervention**, or any parts of it, may have played in bringing about a outcome (i.e. a change!). At the outset of our evaluation, we don’t know if our claim is true or false – that’s what we want to test. Contribution Tracing allows us to do this in a rigorous and efficient way.

From earlier blogs in this series, we understand that certain types of evidence, if found, will increase our confidence in the validity of the claim. In Part 1, we showed you how to qualitatively identify evidence which has the power to strengthen confidence in a claim’s validity; by applying the helpful filters of ‘expect to find’ and ‘love to find’, in our search for evidence. In Part 2, we went a step further by assigning probabilities for Sensitivity and Type I Error, for each item of evidence. However, at this point, we still need to know how much our confidence in the claim will increase, should we find the evidence we seek. That’s where Bayes Theorem comes in, showing us how this ancient monk’s idea actually helps us very much in determining our confidence.

As we begin delving into Bayes Theorem, it is useful to refer to the evidence we identified and the probabilities we assigned from earlier blogs, by reviewing Box 2 below.

From our evidence in Box 2, we now want to understand which items of evidence are the most powerful in validating our claim. In other words, what evidence, if found, would quantifiably increase our confidence in the claim **the most**? We work this out by plugging in the probabilities for Sensitivity and Type I Error to a version of Bayes Theorem that we find particularly useful in Contribution Tracing.

It is not the aim of this blog series to explain Bayes Theorem in detail - that will be the focus of future blogs. Our objective here is to show you how applying Bayesian logic can support the identification of the strongest evidence with respect to our claim, as part of a Contribution Tracing evaluation.

We make this important caveat before presenting the version of the Bayes Formula we use in Contribution Tracing, as it can appear at first sight, to be immensely complicated. In reality it’s not! But it takes more explaining than we have time for in this blog, so watch this space!

At this stage, we don’t need you to understand how the formula works, but just to have faith that it does. After all that’s what a theorem is - a mathematical statement which, over time, has been proven true.

To make sense of the formula (See Box 3), you need to know that your confidence level before looking for evidence is called the ‘prior’ and, following evidence gathering, the ‘posterior’. For ease, the prior can be set at 0.5. In the language of probability, this means that we are neither confident nor sceptical about the validity of the claim – it’s 50/50 whether it is true or false. With the prior set at 0.5, and the probabilities from Box 2 above, we are now ready to plug everything into the formula.

Let’s use the probabilities we calculated for evidence item 1. In this example of a training agenda, Sensitivity was estimated to be 0.95 and Type I Error, 0.4. Based on these values, should we discover such an item of evidence, it would move our confidence from 0.5 (prior) to 0.7 (posterior) – see Box 4 below. In other words, if we found a training agenda during our evaluation, that met our expectations of what it should contain, it would increase our confidence by 20 percentage points. What we are saying here is that finding such a training agenda would give us 70% confidence that the Project really did deliver its training event.

Applying the same process for each item of evidence in Box 2 gives us the posterior values shown in Box 4 below. As you can see, the most powerful items of evidence are the signed attendance record and the video of the training event. **Focusing efforts on finding these most powerful items of evidence first**, would be the best strategy to validate this component of the claim. If one or more of these items could not be found, attention would then shift to other items, such as interviews with participants, and the training agenda. This stepped approach to data collection means that we focus effort firstly on evidence with the highest probative value; only gathering lower value evidence, if it is needed. This saves us time (and precious resources!) as we prioritise evidence according to its value. Therefore, in the long run, we’ll spend less time on evidence collection.

The example we have taken from the GSAM Project has been limited to just one component and only a small number of evidence items, as a way to facilitate learning in the context of this blog series. But the real GSAM case had 14 components and 61 items of evidence that were initially identified. In cases where the list of evidence to be gathered is lengthy, you can see why identifying the most powerful data makes sense to save scarce time and resources.

For those less keen on numbers, there is a useful qualitative rubric that can be used to accompany the quantitative values which help describe posterior confidence levels – See Box 6 (Befani and Stedman-Bryce, 2016).

In summary, the key data collection steps found in Contribution Tracing, are an effective way of avoiding the data trap.

- The first step, after having clearly articulated your causal mechanism, is to focus on one component at a time and identify the evidence you would ‘expect to find’ and ‘love to find’, if the component of the claim is true. This is a useful activity which generates evidence items in a way which is focused on the specific claim.
- Following initial evidence identification, each item of evidence is assigned two probabilities, one for Sensitivity and one for Type I Error. Ideally, we are looking for evidence with high Sensitivity and low Type I Error.
- Plug the probabilities for Sensitivity and Type I Error into the Bayes Formula to determine the posterior confidence for each item of evidence.
- Focus evidence gathering firstly on evidence with the highest posterior confidence values, for example with values of 0.85 and above. If, having searched for this evidence, it is not found, move to the next level down (0.7 – 0.85). Again, if insufficient evidence is found, move to the next level down (0.5 – 0.7). Continue this process until you find the evidence you need, and leave the rest! The point is that you don’t need to gather all the evidence in your list, just the evidence that validates your claim with the highest level of confidence.

Remember that avoiding the data trap is just one part of the Contribution Tracing approach to impact evaluation. The purpose of this blog series has been to highlight the unique steps Contribution Tracing follows to ensure a targeted evidence-gathering approach. Conducting a Contribution Tracing evaluation has other elements which make it an incredibly rigorous, theory-based approach to impact evaluation.

If you want to learn more, please refer to these useful resources list below. Feel free to shout out to us on Twitter @PamojaUK and ‘like’ our Facebook Page for updates at https://www.facebook.com/pamojaevaluation/