Now and then you encounter data with ‘extreme values’. Values which are much larger than the other ones – and so hard to analyse.
A dataset which is skewed like that needs to have a special treatment when visualized, because otherwise the individual values can’t be identified correctly.
[This blogpost is part of a series tips tricks, as being presented at VizIt Berlin 2023]
Example
Suppose that we have data like sales which consists of rather similar values per category, but with one extremely large value, or outlier:
We often talk about ‘outliers’ as ‘rare’ or ‘wrong data’, but we can assume one category – Labels – is really doing very well. So well, it’s hard to differentiate the other categories.
There are several options to solve this problem – this article offers no less than five – but one of the most common ways is the ‘broken bar’ method, since it is easy to understand for anyone.
So I want to change the visualisation and ‘shorten’ the long bar of the “Labels” category, without lying to the audience. But first I have to ask myself:
What is an extreme value?
Which value needs to be treated? Visually it’s easy: the one which sticks out the most. Mathematically this can be looked at by calculating the percent of total for each category.
I can see the ‘Labels’ category represents 85% of the total sales – clearly extreme. For my example, I will treat values above 60% as ‘extreme’, or ‘outlier’
What I need to do is to calculate the second largest value, as a baseline for the largest value. I use the ranking table calculation to get this value.
How to visualize
This extreme value will need to be shortened – but we still need to accentuate the fact that this is a large value.
My new ‘sales’ measure will be twice the ‘second largest value’ for the largest value, or of course the true value of sales for all other values.
And I will ‘break’ the bar at 1.9x the value of the second largest value.
How do I show it’s an outlier? Using an extra measure, visualised using an appropriate shape. This shape will be place on the outlier at 1.9 times the value of the second largest value.
How to do this in Tableau
Finding the outliers
The percentage of total for each category can easily be calculated using a Table Calculation like
… so the outliers can be identified by comparing these to my ‘border-value’ of 60%:
Calculating the ‘second largest value’
Finding the second largest value – the baseline for my calculation – is easier than you might think.
Using the RANK() table calculation I find the ‘number 2’ largest value, and enclosing this with a WINDOW_MAX this value is available on each row/category of my visualisation:
Replace value with ‘adjusted’ value for outlier
Instead of ‘sum(sales)’, we need to use an alternative, adjusted value for the bar: if the category is an outlier, than use twice the second largest value, otherwise just use the sum of sales:
If we use this value instead of the original Sales value you will get this:
You might want to highlight the adjusted value by dragging ‘[Outlier ?]’ on the color shelf:
.. but this draws far too much attention to this category.
More importantly, remove the axis because these values are not correct for the outlier:
And add the original (!) sales value to the labels:
Adding the ‘break’
Only showing the labels isn’t enough to differentiate between the ‘normal’ and adjusted values, a shape on top of the bar will be more clear.
Tableau doesn’t contain a good default shape for this, so I created my own shape – two skewed lines with whitespace in between – and stored in the ‘Shaped’ folder of my Tableau Repository.
We need just one extra calculation: the ‘break’ on the bar, represented by the shape.
I added this value to the viz, changed the mark to ‘Shape’, and selected my freshly created outlier icon:
Combine the values use ‘dual axis’ – and now
Using this method I can read all the values well, and it’s clear the Labels value is an outlier!
Variations…
This method can be adapted in several ways. It can be adapted to use multiple ‘outliers’ in one viz. Using a parameter you can change the threshold value. And of course you can use this on a vertical bar-chart – you only need to use a different shape.
You can view and download an example workbook on Tableau Public with follow along above instructions step-by-step.
I would love to hear – and see! – what you created using this method!
Be notified of new content…
If you find these tips and tutorials useful, you can follow (and connect) me on LinkedIn for all the latest content.