Don’t be fooled by the illusion of accuracy

The most important figures that one needs for management are unknown or unknowable, but successful management must nevertheless take account of them.

-W Edwards Deming

Last week I wrote about customer feedback, and I’ve had a question about one of the comments I made there.  I said how do you assess a rating of 3 out of 5 when you ask a customer about service.

Not everything that can be counted counts, and not everything that counts can be counted.

-William Bruce Cameron

But of course when it comes to improving our customer service we need to find ways to ensure we can count things, measure and improve products and services.

When it comes to the improvement part, I gave what I believe is an excellent example of the type of questions you CAN ask that will give you actionable information.  But i skipped over the measurement part.

Anyone with a basic understanding of statistical theory and psychology knows that any measure of customer preference (desired outcomes) will always be invalid and unreliable because we are human, and we have human biases.  We don’t measure things exactly the same and we think about products and services in different ways.

so a quick summary of my argument is… Measure your customers feedback, but accept that it will be messy and inaccurate. Dont be fooled by simple or single measures of satisfaction as you will becdisappointed.  The best measure is a messy complicated one… Because your customers are messy and complicated in their views too.

And example of this falibility of using numbers as an accurate view of customers feedback… is the rating YouTube used to have of its videos. It used a 1-5 assessment of how you rated the videos. After a few years it realised the futility of quantifying preference. As a Youtube spokesman commented “it seems like when it comes to ratings it’s pretty much all or nothing. Great videos prompt action; anything less prompts indifference.”  So either every video is super fantastic… Or we have to accept that viewers who felt the videos were average or bad just didn’t rate tem.

As a result, YouTube moved from offering a 5-star rating system, to a thumbs up / down model. The data told them that people didn’t logically rate their preference for a video. Instead they just rounded it up to a 5,  or didn’t even care enough to vote.

i like tumbs up/ down as its simple to use and simple to understand.

From this simple response by increasing the options we offer our customers certainly doesn’t increase accuracy even if we hope it might. Unfortunately, the way customers interpret the question and your range if answers creates a data set that doesn’t make sense.  Lets see what increasing the options to answer has…

What would you answer if I were to ask you a question like “Are you finding this article interesting?” and gave you the choices Yes, No, and Not sure. One of those three answers would likely be easy to pick, yes?

That’s a 3-point scale- one more choice than thumbs up or down… But do you get any actionable information? Does ‘not sure” help more than 👍👎

If I upgrade it to a 5-point scale, you may think you are getting even more valuable information, but respondents might find it harder to answer. Somwhats your view of the difference between Very Interesting, Mildly Interesting, Not Sure, Mildly Not Interesting, Very Not Interesting. After all, what does ‘mildly not interesting’ mean and is it equally spaced along a continuum of answers?

What about if we try to get even more quality information from our customers… Add another 2 points, use a 7 point scale…

A 7-point scale makes this even more difficult. When we run out of labels, we resort to numbers: Very Interesting, 6, 5, Not Sure, 3, 2, Very Not Interesting. So whats he difference between a 5 and a 6 and does your view of what a 5 is equal my view?

what about adding more points…

Not only is this hard to answer, it’s hard to interpret. What is the difference between a 3 and 2 now? Both are on the negative scale, but do they mean different things?

Unfortunately, the way respondents can interpret the questions creates a data set that doesn’t make sense.  You and I could have the exact same experience, yet I’d give it a 9 and you’d give it a 6. Is there a meaningful difference?

For some customer experience measures – such as Bain’s net promoter score – that difference is a huge difference- a 6 gives a -100% score yet a 7 gives a 0% score. We’re somehow supposed to understand the difference between a 6 and 7.  Same result between a respondents score of 8 (= 0%) and 9 (=100%). Firstly i dont think many people can assess the difference and i dont think many care either.

So, the three biggest reasons why what customers want cannot be measured easily:

  • Putting a number on something, doesn’t make it quantitative
  • Measures of customer’s desired outcomes (preferences) change continually and are easily manipulated
  • Value is non-linear ( my 6 is different to your 6; and 6 isn’t 10% better than 5 and 10% than 7)

Value is non-linear

For hundreds of years, value ( how people perceive what they receive in exchange for something) was believed to be linear. It makes sense to do because it makes the math way easier.

An example of linear value thinking would be to think that if I double my wealth, I will double my happiness. But as we all know, that isn’t even close to being true. Why? Because value is distinctly non-linear.

Moreover, positive gains are calculated differently than negative losses.When you use something like a numbered scale to measure customer’s preferences, you’re using this unreliable ruler– whether you know it or not. In the customer’s mind, the distance between 1 and 2 is different than between 4 and 5. A visual representation of a numbered scale within the context of measuring customer preference is depicted in this figure on the left.

We assume value is perceived in a linear way because we give it numbers… like on the left… but actually value is perceived by people as depicted on the right- we hate losses more than we value gains. And each line is different for every person you assess.

Now, there’s nothing wrong with using numbers to name customer preference. It’s just necessary to know that these data are qualitative ( the number is a name or description not a numerical value it cant be divided or added or averaged), not quantitative. However, too many people either forget this or don’t know. So they end up doing things with customer responses that they shouldn’t.

There is no natural measure for concepts such as attitudes and opinions. We can devise scales that are ordinal (the responses can be ranked in order of strength of agreement, for instance) to measure such constructs, but it is impossible to determine whether the intervals among points on such scales are equally spaced. Therefore, data collected using scales  should be analysed at the ordinal or name level rather than at the interval or ratio level. You can say 7 is better than 6, but you cant say it 10% better.

Also our views change. In the last post I wrote about a guy rating service at a Disney cafe 10 out of 10, and when asked if there was anything was less than perfect he gave an example… had been asked after that to rate the service out of 10… would it have been the same? Had he gone to visit another cafe and had even better service would he then be thinking the previous Disney cafe only deserves a 9 out of 10.  We don’t have these automatic ratings set in our bones… we react to what’s there. We have a gut reaction to the service or product that’s on offer and hate it, like it or love it. Or love it or hate it, or “meh, its ok lah”.

And if that wasn’t tricky enough, data gathered about “what customers want” can change during their taking of the survey. As Norbert Schwarz points out in his paper Cognitive Aspects of Survey Methodology: Since the early days of opinion polls (Cantril, 1944; Payne, 1951), survey researchers observed that minor variations in question wording, format and order can profoundly affect the obtained answers.  If your survey if too long, the care and attention given to the first few answers is different from the hastily considered views on question 102.

So what can we do?

Worry less about the numbers, and worry more about the trend.  As you change things – are more people rating things better?  Is the improvement significant?  Yes, you can group people into segments based on their responses.  Aim to get everyone into the top box… very satisfied. Don’t be happy linking satisfied and very satisfied together… if people aren’t “very satisfied” they will go elsewhere.

Look for trends. Despite the reality that people may mean something different when they think of “very or extremely important,” it is reasonable to conclude that if 80% of the people in our sample say an outcome is very/extremely important and just 20% say they are very/extremely satisfied, then this reveals an opportunity for value creation. Its important to them and they are unhappy about they way we are doing it… we have and opportunity here!