Avoiding the data trap blog series
'Avoiding the Data Trap’ is a 3-part blog series developed by Pamoja to highlight a new approach to impact evaluation, called Contribution Tracing. The blog series explains key steps in Contribution Tracing that can guide evaluators, and those commissioning evaluations, to avoid common data traps, by identifying and gathering only the strongest data. The blog series draws from a live case study of a Contribution Tracing pilot evaluation of Ghana’s Strengthening Accountability Mechanisms Project (GSAM) project. This pilot forms part of a learning partnership called the Capturing Complex Change Project, between Pamoja, CARE UK International and CARE Ghana and Bangladesh Country Offices.
Part 2: Evaluator Seeks Sensitive Data
Welcome to the second edition in the ‘Avoiding the data trap’ blog series. If you missed the first blog, ‘Mining for data gold!’, we encourage you to read that first.
In the last edition, we introduced the common problem of the ‘data trap’ that people can often fall into when collecting data – too much effort spent on gathering relatively useless data and not enough of the ‘right’ data that makes for strong evidence! As a potential solution, we introduced the first of four key steps in a new theory-based approach called Contribution Tracing (see steps in Box 1). To recap, Step 1 helps us identify the right evidence that can help prevent us from falling into the ‘data trap’. Let’s now continue by exploring Step 2: assigning probabilities for Sensitivity and Type I Error.
Based on the example case from the GSAM project, we identified five items of evidence that we might look for during data collection (Box 2 above). In step two, we turn our attention to finding out which items of evidence are the most powerful. We do this by firstly assigning two probabilities, known as Sensitivity and Type I Error (Check out GSAM team member, Samuel, who gives a brief explanation of these two probabilities in the YouTube video below).
The probability for Sensitivity works like this: if the component of the claim is TRUE, what is the probability of finding a specific type of evidence? Let’s remind ourselves of the necessary component we worked with from the GSAM claim in our first blog:
The GSAM project (entity) delivered training to Civil Society Organisations (activity) to increase their knowledge and skills in engaging with District Assemblies on the planning and implementation processes of capital projects.
In our example, the question we ask ourselves when assigning the probability for Sensitivity is: if the GSAM project really did deliver its training programme to Civil Society Organisations [component], what is the probability of finding a training agenda [evidence item #1]? This logic would be applied to each item of evidence identified in Box 2.
Probabilities are numbers between 0 and 1, which are equivalent to percentages between 0% to 100%. In Contribution Tracing, we can think of the probabilities we set for Sensitivity as follows: a probability of 0 means there is absolutely no chance of finding the evidence item (0%), whereas 1 means there is 100% chance of finding the evidence item. Of course, in reality, we can never have such definitive certainty until we begin our search! Therefore, it is common to start this process by assigning evidence with low Sensitivity very close to 0 (such as 0.5, 0.1 or 0.01) and high Sensitivity very close to 1 (such as 0.9, 0.95 or 0.99).
From our example of the GSAM project, what probability might we set for evidence item 1? To start, we know that GSAM is a well-funded, well-organised project, being implemented by a consortium of large and reputable NGOs. We also know that it is common practice to produce training agendas in this context. Therefore, we would assume that there is a high chance of finding a training agenda. Let’s say we decide on a high probability of 0.95. By setting such a probability, we are saying that we are very confident (95% in fact) of finding such evidence, should we look for it. However, we have left some room for doubt, at a level of 5%. We then follow the same process for other items of evidence - the Sensitivities for each one are shown in Box 3 below.
Remember that Sensitivity is based on our expectations of finding evidence if the component of the claim is TRUE. You will note that the Sensitivities for evidence items 1 through 4 are very similar, but evidence item 5 has a very low Sensitivity. Why? This is because, it is unusual, especially in the Ghanaian context, to film such training events. So, we set the Sensitivity lower at only 10%, because we wouldn’t expect to find such evidence if we were to look for it.
Let’s turn now to the probabilities for Type I Error, which works like this: if the component of the claim is FALSE, what is the probability of finding a specific type of evidence? In our example, it would look like this: if the GSAM project DID NOT deliver its training programme to Civil Society Organisations [component], what is the probability of finding a training agenda for the event anyway [evidence item #1]?
This might sound a little crazy at first, but let’s think it through. It is plausible that the plans for the training were well advanced and hence the agenda had been developed. Then, for a number of legitimate reasons, the training never went ahead e.g. the trainer got sick, or there was a tropical storm. I’m sure you could think of other reasons why the training was cancelled - they are numerous.
This means that the training agenda could exist even if the training event never happened. Therefore, we can assert that this item of evidence has a medium to high Type I Error. This assignment depends on the number of potential, alternative explanations that could plausibly describe its existence. In our example, let’s set Type I Error as 0.4 for this item of evidence. Here we are saying there is a 40% chance that the training agenda could exist, even if the training event never took place. Type I Errors for the other items of evidence are shown in Box 4.
You’ll note the evidence with the lowest Type I Errors are for the signed attendance record (item #2) and the video recording of the training event (item #5). Why? When assigning Type I Error, we must think about other explanations for the existence of the item of evidence, other than the explanation under investigation - in this case, the GSAM project’s training event. While it is possible that the GSAM project may have forged the signed attendance record, it is highly unlikely. Similarly, the level of deception required to stage and film a fake training event, is beyond comprehension. Therefore, the best explanation for the existence of these two items is that the GSAM project really did deliver its training event.
In Contribution Tracing, we can think of the probabilities we set for Type I Error as follows: a probability of 0 means there are absolutely no other explanations for the existence of the item of evidence, other than the component of the claim. Whereas a probability of 1 means that multiple, alternative explanations exist, which may be more plausible to explain the existence of the item of evidence. Again, we can never have such definitive certainty, ex ante, to set Type I Error as 0 or 1, so we choose a value close to 0 or 1.
Now going back to the title of our blog, why would evaluators be seeking sensitive data? And similarly, why do they like evidence with low Type I error? Remember that the higher the sensitivity, the more likely we are to find the evidence if we look for it, while the lower the Type I Error, the less likely that other, potentially better, explanations exist.
In this final edition of our blog series we will explain how to use Bayes Theorem to update your confidence in the component of the claim following data collection.