Just a belated update on an interactive tool that I helped to create: MSN News’s UK Salary Tracker.
I was signed up to do the data research side to things – collecting and reformatting the data that made the interactive possible.
The initial brief was less decided, as the journalists at MSN weren’t sure what data was out there on the scale that they needed, and under the breakdowns required. I was sent off to see what was available on salaries, occupations, unemployment and employment data, based on districts in the UK.
I set out to track down what government data was available, (having discovered that there were no independent surveys covering the whole area in a consistent measure under any of the variables required), and came across an invaluable tool.
The website of the Office for National Statistics, Nomisweb, allows you to select the particular data you want to view from several different surveys, choosing the degree of regional detail, time coverage, and other variables.
An example of a key set of data that I came across this way was the most recent set from the Annual Population Survey:
This gave me the % employment rate of men and women according to their rough occupation, (albeit without any age breakdown).
After conducting some thorough research on everything out there, I presented my findings to the MSN news team. In the course of all my digging, I got in touch with a few people at the Northern Ireland Statistics and Research Agency (who usually passed me on to other people, who usually passed me on, etc…) to check whether the Northern Ireland data was measured in a way that allowed for a direct comparison with the UK data, that I had sourced separately. (The Northern Ireland data on earnings was not available on Nomis).
I used the Nomis tool to create several spreadsheets, and also rang the helpful people there a few times, and managed to wheedle some earnings data arranged by an age breakdown that wasn’t publicly available, out of them.
I sent the following description of what I’d found, and dropbox of spreadsheets, to the MSN team:
Variables that we can use in he interactive so far, (spreadsheet titles indicated by ”):
- Annual Population Survey info (‘Nomis 2’ and Northern Ireland 2, 3 and 4) – Can tell people how likely they are to be employed/unemployed/other level of economic activity according to age and gender and district/unitary authority area.
- Annual Survey of Hours and Earnings (‘Nomis 6’ & ‘Northern Ireland 1’) – average gross earnings in their district/unitary authority area for their gender
- Spreadsheet ‘Nomis 3’ (Annual Population Survey) – The economic activity rate for people with a certain level of disability in district/unitary authority areas, (England, Scotland and Wales only).
- Spreadsheet ‘Nomis 5’ (Annual Population Survey) – The most common educational level of those economically active for their area and age – PATCHY DATA, (and England, Scotland and Wales only).
- Spreadsheet ‘Nomis 4’ (Annual Population Survey) – The most common ethnic minority group for a certain industry in their area/most common industry for certain ethnic minority groups in their area (categories ethnic minority, mixed, white) PATCHY DATA, (and England Scotland and Wales only).
- Spreadsheet ‘Nomis 8’ (Annual Population Survey) – % of people employed in different sub-industries (e.g. corporate managers, science and research) in their district/unitary authority, (England, Scotland, and Wales only).
New data that I’ve dug up recently:
- The ‘table 15’ folder (Annual Survey of Hours and Earnings) – Median and mean earnings, measured in separate spreadsheets in terms of annual gross, weekly gross, basic pay, hourly and overtime, according to occupation, gender, and broad region (e.g. North East, North West etc).
- ‘Not publicly available earnings data…’ (Annual Survey of Hours and Earnings) – Median and mean earnings, both annual and weekly gross, according to age, gender and broad region, (e.g. North East, North West etc).
Since the team had by then decided that it was earnings data they wanted, it was decided that we would forgo the district authority breakdown and go for the broader regional breakdowns available in the earnings data instead.
The next stage was reformatting the spreadsheets after a discussion with the designer responsible for the interactive. They told me how the data needed to slot into it, and I reworked the data from something like this:
Into something like this:
I created three spreadsheets in total:
And the end result just goes to show how much you can do with publicly available data, with some time and a spreadsheet.