Question all data you collect or are presented with.
First of all, make sure you are collecting the right data. If you don’t measure what matters, the data is garbage. I have written before on using the right metric.
Once you are sure that you have collected the right data, never trust your first analysis of the data. Whenever you see %, averages etc. question it even more.
Percentages are misleading – you need to know the sample size. Imagine if someone told you that 50% of the people who ate at a specific restaurant got food poisoning – 50% sounds scary right? What on deeper analysis, you find out that the data was collected by only talking to 2 people and it so happened that one out of this group did get food poisoning and 200 people on an average eat at that restaurant every week?. Is it scary anymore? No. Your whole perspective changes once you find out the sample size, and the confidence level of the sample size being representative of the population.
Averages are the other ones to be suspicious of – a very few outliers can very quickly tilt the average up or down. Always double check it with medians to make sure outliers are not having undue influences. Let us look at the following example. Assume that you sold 30 different widgets and the selling prices were as given below.
4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 7, 8, 8, 30, 30, 36, 40, 50, 50
Average selling price = $12
Median (50th percentile) = $5
75% percentile = $7.75
The average selling price of $12 sounds like a good number. But if you look at the data closely, 50% of the items sold for $5 or less and 75% of the items sold for $7.75 or less. The six higher priced items are making the average look much higher. Now the outliers are not necessarily bad. Another way of looking at this is that just be selling 5 higher priced widgets, you were able to quickly double your average selling price. So your conclusion might be to focus on selling priced items as opposed to focusing on volume. But how long is the sales cycle for the higher priced items? What is the selling costs compared to selling a volume of lower priced items? Asking these questions of your data will help you draw the right conclusions from the same data.
One way to avoid this trap is to always triangulate – will you arrive at the same conclusions if you approach it another way? Try collecting the data from a different angle and see if you will arrive at the same conclusion.This will also remove from your analysis any biases associated with one data set.
Never trust your first analysis of your data.
Image: Courtesy of Neural Computing Research Group
2 thoughts on “Always question data – at least twice”
Figures don’t lie, but liars figure… 🙂
Or how ’bout: There’s Lies, Damns Lies, and then statistics…
You are right in saying data can be interpretted in various ways. Many a times, we “fit” the data as per our pre-conceived notion to prove our theory right.