Pivot tables are one of the most useful tools available for speedy data analysis – luckily they’re also quite easy to use, and with a bit of playing around you soon get the hand of it. They give you a way to reconstruct the data in your spreadsheet, interrogating it and almost asking it questions.
In my last post I introduced a dataset on waiting times for treatment from NHS providers. In this post I’m going to show how I (attempt to!) analyse the data to find a story, using pivot tables.
I started with the second sheet from my last post, which showed for each of 19 different ‘treatment functions’ or areas, (such as trauma or urology) within each provider: 1. the total number of completed courses of treatment, 2. the total number of courses with a known start date, and 3. the total number with a known start date that had been completed within 18 weeks:
18 weeks is the benchmark time limit for completing patient treatment, enshrined in the NHS constitution. It’s a target that was also reiterated by David Cameron in the summer of 2011 as part of his five NHS pledges.
So, subtracting the comments of each cell in column BK (treatments completed within 18 weeks) from its counterpart cell in column BJ (total number of completed courses of treatment) – ought to give us the number of completed treatments that were not completed within an 18-week period.
I mentioned in an earlier post the importance of starting out on a story with a clear hypothesis or a question.
The same has to apply here – so I decided that I wanted to find out which treatment functions, across all providers, was the most likely to have patients whose treatment was not completed within 18 weeks.
To do this, I decided to focus on the columns listing “treatment function”, “total number of completed pathways (with a known clock start)” and “total with a known clock start completed within 18 weeks”. I coped and pasted these columns into a fresh sheet and then created a new column, (column E below) working out the difference between the last two columns.
Creating the pivot table:
Having created the data I wanted, I could move on to the pivot table – which should end up looking a little like this, depending on how you structure it and what you want to know:
So, how to get there, starting from your standard spreadsheet.
1. Click anywhere in the data and hit ctrl + a to select all of it, (or just select the columns you want).
2. Click on insert in the ribbon at the top, and then select pivot table.
3. A box will appear asking if you want to create it in a new worksheet or have it appear at the bottom of your existing worksheet. I chose to have it in a fresh sheet, and then hit ‘ok’.
A blank sheet then appears, with a sidebar titled ‘PivotTable Field List’.
4. The process of creating the table using the sidebar is fairly intuitive. You can drag and drop fields from the list to one of the four boxes below, in order to structure your table.
1. Row labels – fields dropped in this row will show up as staggered rows in the table, depending on the order in which you enter them. As an example, I’ve dropped into the row labels box ‘treatment function’ followed by the label for the column that I created, showing the number of treatments that had taken over 18 weeks:
It’s brought up all 19 of the values under ‘treatment function’, with each of the different values for the number of treatments taking over 18 weeks for each provider clustered under these labels. It’s an example of how a pivot table can restructure data well, but it’s not useful so far.
2. Column labels – this fairly straightforwardly does exactly the same thing that row labels does, but spreads out in a horizontal rather than vertical direction.
You will generally need to use at least one field in either row or column labels, together with at least one another in ‘values’ or in ‘report filter’. (Building on that with more as required:
3. Report Filter – the final box allows you to quickly filter through one field using another. Below, I’ve set ‘treatment function’ as the row label again, with my column showing the number of treatments taking over 18 weeks in each medical area within each provider, set as the filter.
By using the dropdown menu on the filter at the top, I can choose for only the records in the ‘treatment function’ field that match a particular value from my filter field to be shown. By selecting the number ‘152’ – the highest number available in my filter field, (showing that the highest number of cases taking over18 weeks for any single treatment function in any single hospital was 152), it displays only the corresponding ‘treatment function’ for this record.
4. Values – the field in values will be measured against the row labels. The table will automatically present you with the sum or average of your values for each of the rows:
Here, I’ve kept ‘treatment function’ in the row label field and have moved my column measuring the number of treatments that took over 18 weeks, to ‘values’.
It makes it clear at the top that it’s showing me the sum of all the treatments that took over 18 weeks in each medical area. I could change this to an average or some other measure if I wanted.
In order to do this you must click on the drop-down triangle that appears on the field label, next to ‘sum of how…’. Then click on value field settings. This allows you to choose from a range of methods of summarizing the data – I’m going to stick with ‘sum’ for now.
In the end, this structure provides me with the story I’m looking for – reordering the data makes it clear that ‘trauma and orthopaedics’ accounted for the highest number of cases not fully treated within 18 weeks, followed by ‘general surgery’.
And that’s a basic introduction to using pivot tables to find a story. In my next post, a bit of slightly more complex analysis, before moving on to visualisations!