Customer/Employee Net Promoter Score — How to Compare?
Are you using Net Promoter Score (NPS) results to assess how your customers (or employees) are promoting your company? Are you interested to learn how to compare the overall NPS results with subsets (e.g. NPS among clients in market segment A, B or C)? This use case is for you.
Real life case
Each year, the corporate Knowlege Management team is running a survey among the 20,000 community members of Schneider Electric. The last question is “How likely are you to recommend to a colleague participation in the Schneider Electric Communities?” We ask them to answer on a scale from 0 to 10, where 0 means totally disagree and 10 means totally agree. From the answers, three classes of respondents are derived, the Promoters who have answered 9 or 10, the Passives (from 7 to 8) and the Detractors (0 to 6). The Net Promoter Score (NPS) of the respondents is derived by subtracting the percentage of Detractors among the respondents from the percentage of Promoters. Thus, the NPS ranges from -100 (all the respondents are Detractors) to +100 (all of them are Promoters).
We would like to compare the overall NPS with the NPS of several subsets, in order to identify points of attention, such as “which subsets are much more promoter or much more detractor than the overall”, to propose recommendations.
We cannot directly compare the overall NPS with the NPS of each subset, for several reasons. First, the data used to compute the NPS are coming from a sample of the population, the people who have answered the survey. This leads to a sampling margin of error; we say for example that the (unknown) NPS of the population is within plus or minus 5 of the NPS of sample (e.g. 29). Second, he subsets of the sample and the overall sample are linked, thus it impacts the computation of the margin of error or each subset.
We assume that the sample is representative of the population (e.g. the angry customers may have answered as much as than the happy ones).
Graphical approach
The example below is taken from the result of a survey, where 5000 people have answered (the sample) out of 24,000 (the population).
First the overall NPS is computed.
Then subsets are identified in the dataset of results (the 5000 answers), such as the country of residence or the nature of the economy in which they live (mature or new).
First analysis: the Lift.
The Lift corresponds to the level of concentration of the values (Promoter, Passive, Detractor) in the subsets over the average of all the dataset; a lift of 1 representing the average concentration in the dataset. Example below: as Detractors represent 20% of the database, “Detractors” within the subset “Mature” of the variable “Economies” has a lift of 1.2, which indicates that for that subset 24% (=20×1.2) of the respondents are Detractors. The zone where the lift is usually non significant (± 20%) is greyed out, although this zone depends in practice on the number of respondents in each subset (the more the number of respondents, the narrower the zone).
In this graph, we can easily identify that Mature economies are more Detractor. It is more difficult to say for the New economies. As this graph does not take into account the margin of error inherent to a survey and the margin of error of each subset, another graph is necessary to supplement the analysis when the Promoter and Detractor bars are in the greyed-out zone.
Second analysis: the margin of error
A deeper analysis is made by comparing the NPS of the subsets to the overall NPS. The Net Promoter Score is displayed (graph below) in blue for the entire dataset and in colour for each subset. A margin of error is drawn around the NPS of each modality and around the overall NPS (here 29). The width of the margin of error depends mainly on the number of respondents (the more the number of respondents, the narrower the margin and the closer the ratio number of respondents / population size is to 1, the narrower the margin). The subsets for which the NPS is significantly below the overall NPS are in red; those significantly above are in green and those for which it is not possible to decide are in black.
In this graph, as opposed to the one above, we can easily identify that New economies are more Promoter, as its NPS is much higher than the overall NPS.
Example: Countries of residence
In the graph below, we have stacked the two analysis together. The people in the five countries of the right are much more promoter than the average. The countries with less than 30 respondents are not displayed (not statistically significant). The smaller the number of respondents from a country, the larger of the margin of error. The countries for which the NPS is significantly below the overall NPS are in red; those significantly above are in green and those for which it is not possible to decide are in black.
How to build the graphs
The graph with the Lift is easy to compute. For each of the three indicators (Promoter, Passive, Detractor), the ratio of the percentage in the subset vs. the percentage in the whole dataset is computed. Then the bar chart is produced with these ratios, one colour per indicator (green, yellow, red).
The graph with the NPS and the margins of error requires some advanced statistics. First the confidence at 95% of the overall NPS is computed. Let’s call P the percentage of Promoter and D the percentage of Detractor in the sample, N the population size and n the sample size. Let’s call Var the variance of the NPS and fpc the finite population correction due to the large size of the sample (above 5% of the population).
In our example above with the countries of residence, with P=0.60, D=0.8, N=13300 and n=2700, we have NPS=52, Variance=0.42, fpc=0.89 and Confidence=2.2. The confidence interval of the NPS at 95% is [49.8, 54.2]. This allows drawing the horizontal blue line and the two blue dotted lines.
The same formula is applied for the NPS and confidence interval of each of the subsets. This allow to draw the NPS dots and their margins of error. The colour of the margin of error for each subset is green if the NPS of the subset is statistically above the overall NPS, red if below, black if non significate (cannot decide).
Note that it is not sufficient to check if the two confidence intervals (green or red) of subset and overall (in blue) do not overlap. The probabilistic reasoning detailed below may lead us to conclude that one NPS is significantly greater than the other, even if the two confidence intervals overlap.
Conclusion
Making decision based on the comparison of the values of a Net Promoter Score is not obvious. The size of the sample (the people who have responded) is the key factor that allows deciding if such a comparison makes sense statistically. A graphical representation like the one proposed above simplifies the decision making process.
Appendix
How to decide if the NPS of a subset is significantly different from the overall NPS
The colour of the margin of error for each subset (green if above, red if below, black if non significate) requires computing a t-test to test if the NPS of the subset is significantly different from the NPS of the dataset. For this, we will use the formulas of the paper Part-Whole Comparisons of Means, by Albert Madansky (Mai 2016).
We consider here the situation in which we have m randomly drawn observations from population 1 (sample 1) and n-m observations randomly drawn from population 2 (sample 2), and the data are drawn independently from each of the populations. We designate the NPS of the two samples as NPS1 and NPS2, the sample variances of the two data sets will be designated as Var1 and Var2 and the fpc of the two data sets will be designated by fcp1 and fcp2. We also designate by NPS the total Net Promoter Score of the sample of n observations from the two populations. The three NPS follow a normal distribution. The object of the t-test is to test whether the NPS of population 1 is significantly lower or higher than that of the composite population consisting of populations 1 and 2.
The t statistic for testing the hypothesis that the mean of sample 1 is different from that of the composite sample (1 and 2) is given by
A note on degree of freedom for the t-test. The preferred approach is the Welch approximation, developed specifically for the two sample t-test. The degree of freedom of the Welch approximation is given by:
Let’s call Tdfw the student distribution for a one-sided 95% confidence of dfw degree of freedom. Note that if dfw tends to the infinite, then Tdfw tends to 1.64, the one-sided 95% confidence of a normal distribution.
The NPS of population 1 (NPS1) is significantly higher (95% of the time) than that of the composite population consisting of populations 1 and 2 (NPS), if
If this inequality is true, then the confidence interval in the graph is green-coloured.
The NPS of population 1 (NPS1) is significantly lower (95% of the time) than that of the composite population consisting of populations 1 and 2 (), if
If this inequality is true, then the confidence interval in the graph is red-coloured.
If none of these inequalities is true, then the confidence interval in the graph is black-coloured.
References
https://en.wikipedia.org/wiki/Student%27s_t-distribution
https://en.wikipedia.org/wiki/Welch%27s_t-test
https://en.wikipedia.org/wiki/Student%27s_t-test
http://www.analyticalgroup.com/download/part-whole.pdf
R code
The R code used to compute and to display the graphs is available on demand.
Note: Republished from my LinkedIn Article, Jan 02, 2017. https://www.linkedin.com/pulse/customeremployee-net-promoter-score-how-compare-guillaume/
Great article on using Net Promoter Score (NPS) and how to compare it with subsets. The explanation of Lift and Margin of Error graphs are very helpful for better understanding. My question is, what is the minimum sample size required for a reliable NPS comparison?
As a rule of thumb, a sample of 1000 is the minimum to get a narrow margin of error (about 3%)