MSN News UK salary tracker – some (paid!) data journalism:

Just a belated update on an interactive tool that I helped to create: MSN News’s UK Salary Tracker.

I was signed up to do the data research side to things – collecting and reformatting the data that made the interactive possible.

The initial brief was less decided, as the journalists at MSN weren’t sure what data was out there on the scale that they needed, and under the breakdowns required. I was sent off to see what was available on salaries, occupations, unemployment and employment data, based on districts in the UK.

I set out to track down what government data was available, (having discovered that there were no independent surveys covering the whole area in a consistent measure under any of the variables required), and came across an invaluable tool.

The website of the Office for National Statistics, Nomisweb, allows you to select the particular data you want to view from several different surveys,  choosing the degree of regional detail, time coverage, and other variables.

An example of a key set of data that I came across this way was the most recent set from the Annual Population Survey:


This gave me the % employment rate of men and women according to their rough occupation, (albeit without any age breakdown).

After conducting some thorough research on everything out there, I presented my findings to the MSN news team. In the course of all my digging, I got in touch with a few people at the Northern Ireland Statistics and Research Agency (who usually passed me on to other people, who usually passed me on, etc…) to check whether the Northern Ireland data was measured in a way that allowed for a direct comparison with the UK data, that I had sourced separately. (The Northern Ireland data on earnings was not available on Nomis).

I used the Nomis tool to create several spreadsheets, and also rang the helpful people there a few times, and managed to wheedle some earnings data arranged by an age breakdown that wasn’t publicly available, out of them.

I sent the following description of what I’d found, and dropbox of spreadsheets, to the MSN team:

Variables that we can use in he interactive so far, (spreadsheet titles indicated by ”):

  • Annual Population Survey info (‘Nomis 2’ and Northern Ireland 2, 3 and 4) – Can tell people how likely they are to be employed/unemployed/other level of economic activity according to age and gender and district/unitary authority area.
  • Annual Survey of Hours and Earnings (‘Nomis 6’ & ‘Northern Ireland 1’) – average gross earnings in their district/unitary authority area for their gender
  • Spreadsheet ‘Nomis 3’ (Annual Population Survey) – The economic activity rate for people with a certain level of disability in district/unitary authority areas, (England, Scotland and Wales only).
  • Spreadsheet ‘Nomis 5’ (Annual Population Survey) The most common educational level of those economically active for their area and age – PATCHY DATA, (and England, Scotland and Wales only).
  • Spreadsheet ‘Nomis 4’ (Annual Population Survey) – The most common ethnic minority group for a certain industry in their area/most common industry for certain ethnic minority groups in their area (categories ethnic minority, mixed, white) PATCHY DATA, (and England Scotland and Wales only).
  • Spreadsheet ‘Nomis 8’ (Annual Population Survey) – % of people employed in different sub-industries (e.g. corporate managers, science and research) in their district/unitary authority, (England, Scotland, and Wales only).

New data that I’ve dug up recently:

  • The ‘table 15’ folder (Annual Survey of Hours and Earnings) – Median and mean earnings, measured in separate spreadsheets in terms of annual gross, weekly gross, basic pay, hourly and overtime, according to occupation, gender, and broad region (e.g. North East, North West etc).
  • ‘Not publicly available earnings data…’ (Annual Survey of Hours and Earnings) – Median and mean earnings, both annual and weekly gross, according to age, gender and broad region, (e.g. North East, North West etc).

Since the team had by then decided that it was earnings data they wanted, it was decided that we would forgo the district authority breakdown and go for the broader regional breakdowns available in the earnings data instead.

The next stage was reformatting the spreadsheets after a discussion with the designer responsible for the interactive. They told me how the data needed to slot into it, and I reworked the data from something like this:


Into something like this:



I created three spreadsheets in total:


And the end result just goes to show how much you can do with publicly available data, with some time and a spreadsheet.


Google Fusion Tables, Charts and Maps

I found some recently released census data on the proportion of resident adults who usually cycle to work:…/cw0901.xls

…and decided to have a go at mapping it.

The data is grouped by where those asked live, (rather than where they work or cycle), and shows both the numbers and the percentage.

Cycle to work 1

Continue reading

“I don’t even like the term ‘data journalist'”

c64f6d2afc83b21d7df6f9906edd38b2Michael Bauer works at the School of Data and Open Knowledge Foundation teaching journalists and NGOs how to better construct stories from data. He told me how he started out, and what his advice would be to young journalists who want to get to grips with data.

Michael: I studied medicine, and I did research, and I did technology things all the time on the side, and while doing research I realised that a lot of people are not good at dealing with data, so I thought it would be good to help them, to teach them to how to do this. I was able to do research my colleagues couldn’t. After a detour I ended up with the School of Data from the Open Knowledge Foundation where we aim to teach how to do things with data to journalists and to charity organisations.

Nabeelah: How did you end up at the School of Data and the Open Knowledge Foundation and how did you become interested in particular in teaching people like journalists?

Continue reading

Asylum cases waiting over six months for a decision have nearly doubled since February 2012

The number of pending asylum cases that have been waiting for over six months for a decision has nearly doubled since February 2012, according to figures from the government’s monthly asylum application statistics, released yesterday.

The figures show that the number of cases pending an initial decision after six months had risen to 6,342 in February 2013, compared to 3,380 in February 2012. This represents an increase of 87 percent.

The figure from this February is also a four percent increase on the number of asylum cases pending an initial decision after six months in June 2010, one month after the Coalition government came to power, when the number stood at 6,070.

The number of asylum cases pending decision after less than six months has faced an even bigger rise, increasing by 93 percent between June 2010 and February 2013, to 6,511 cases from 3,371 cases.

This is an increase of 27 percent from February 2012, when the figure stood at 5,124 cases.

The graphs linked to below represent the increased numbers of:

  • Cases pending an initial decision after six months (orange),
  • Cases pending an initial decision before or around six months (blue),
  • And the total number of cases awaiting decision (green).

Sheet 1

View graph in full

Sheet 2

View graph in full

Sheet 3

View graph in full

Dr Russell Hargrave, of the charity Asylum Aid, said:

“The coalition promised a fairer, more efficient asylum system, with a commitment to getting decisions right first time. Early figures suggested they were making progress. But everything now points to longer delays and backlogs growing yet again. No one who flees halfway across the world to ask for help should be stuck hearing nothing for months, but that’s clearly what’s happening.

“..Tackling these delays is only right and decent for refugees, but it is also a very public test of the government’s competence”.

He highlighted two possible causes for the growing number of asylum cases waiting for long periods of time, without a clear decision:

” The UK Border Agency faced significant cuts to its personnel and to its resources, so suddenly you’ve got the same number of cases more or less being made each month, but the number of people trained to make them and the resources available to them has fallen actually.

“In the middle of that, around April and May 2012, (which is when the graphs bottom out), there’s been an attempt to bring in a whole new system for dealing with asylum claims, and the transition seems to cause delays and confusion amongst some officials that we’ve dealt with.”

A Home Office spokesperson said: “The system we inherited was hopelessly chaotic. We are bringing it all back under control.

“We are currently focusing on concluding the oldest outstanding asylum claims and this, coupled with an increase in the number of claims we are receiving, has meant that it is taking longer than we would hope to process some applications.

“We are working hard to address this and reduce the time it takes for applications to be processed.”

Data journalism insights from the annual Perugia International Journalism Festival

…For those of us who couldn’t be there, I’ve worked up a little storify with some of the key videos, links and tips to come out of Perugia’s International Journalism Festival. Featuring everyone from The Guardian‘s James Ball to Steven Doig, Knight Chair in Journalism from the Walter Cronkite school, to Spiegel Online’s open news fellow, Friedrich Lindenberg. Well worth a read for any aspiring data journalists.

NHS Waiting Times Part 4 – Visualizing data

My last three posts have showed how I looked for a few simple stories in an excel spreadsheet on NHS waiting times, using filters and pivot tables.

I produced three results:

1. Data on which ‘treatment function’ was responsible for the most treatments taking over 18 weeks.

2. Data on which provider was responsible for the longest waiting times over 26 weeks (6 months).

3. Data on which provider was most responsible for making patients wait 52 or more weeks for treatment.

Continue reading

NHS Waiting times Part 3: Digging a little deeper with Pivot Tables

In my last two posts, I explained how I used filters and pivot tables to carry out some basic analysis of my spreadsheet on NHS waiting times.

Here, I’m going to talk though my further analysis of the data using pivot tables.

I decided to take another look at the spreadsheet showing the breakdown of how many patients had had to wait for what number of weeks, (from “>0-1” to “52 plus”), for each type of treatment, and in each hospital provider:

4 spreadsheet

I’ve underlined the ‘total’ row that appears at the end of the records for each provider, in the image above. When working with pivot tables, you either need to use the total rows alone or to remove them, to avoid the totals skewing the rest of the data. In this case I’ve begun by removing them as I explained in my last post on filters.

Continue reading