Google Fusion Tables, Charts and Maps

I found some recently released census data on the proportion of resident adults who usually cycle to work:…/cw0901.xls

…and decided to have a go at mapping it.

The data is grouped by where those asked live, (rather than where they work or cycle), and shows both the numbers and the percentage.

Cycle to work 1

Continue reading


NHS waiting times story – Part 2: Pivot Tables

Pivot tables are one of the most useful tools available for speedy data analysis – luckily they’re also quite easy to use, and with a bit of playing around you soon get the hand of it. They give you a way to reconstruct the data in your spreadsheet, interrogating it and almost asking it questions.

In my last post I introduced a dataset on waiting times for treatment from NHS providers. In this post I’m going to show how I (attempt to!) analyse the data to find a story, using pivot tables.

I started with the second sheet from my last post, which showed for each of 19 different ‘treatment functions’ or areas, (such as trauma or urology) within each provider: 1. the total number of completed courses of treatment, 2. the total number of courses with a known start date, and 3. the total number with a known start date that had been completed within 18 weeks:

6 pivot tables 2

18 weeks is the benchmark time limit for completing patient treatment, enshrined in the NHS constitution. It’s a target that was also reiterated by David Cameron in the summer of 2011 as part of his five NHS pledges.

So, subtracting the comments of each cell in column BK (treatments completed within 18 weeks) from its counterpart cell in column BJ (total number of completed courses of treatment) – ought to give us the number of completed treatments that were not completed within an 18-week period.

Continue reading

NHS waiting times story – Part 1: Filters

I had a quick go at pulling a few quick stories from a dataset from the NHS’s 2012-13 sets of data on treatment waiting times.

The data that I’m interested in shows the average waiting time for treatment at each of the hospitals listed.

2 Spreadsheet overview

On a separate sheet, it also shows for each of 19 different medical departments within each provider:

  1. The number of completed courses of treatment (completed pathways), it total.
  2. The total number of completed courses of treatment (completed pathways) where the date at which the clock started ticking is known -i.e. where it was recorded.
  3. The total number of courses of treatment with a known clock start date which were completed within 18 weeks.

1 Spreadsheet overview

On this sheet, the difference between the columns showing number 2 and number 3 (which are columns BI and BJ in the picture above), tells us the number of completed courses of treatment that took longer than 18 weeks.

Continue reading

Scraping for dummies – using only a google spreadsheet

A couple of months ago, I was taken to a the eponymous website on which the deaths of race horses are catalogued, (, and taught a little about how to scrape information from a regularly updated table (of the sort found on that page of the site), using only a google spreadsheet.

Google spreadsheets are invaluable. Fact.

1. Create a new google spreadsheet. Type =importhtml into a cell on your spreadsheet. This start to your formula needs to be followed by some information telling the spreadsheet exactly what information to extract – using the following template: (“url”,”query”,”index”)

  • “url” (of the site from which you want to scrape)
  • “query” –  this can be either “table” or “list”. In this case, horsedeathwatch have thoughtfully collated the information in a table rather than a list, so we’re plumping for “table”.
  • “index” – this is simply the number of the table or list, and may require a little trial and error. So if your table is the second on the page, enter “2”. In this case, we will enter “1”.

2. The finished formula (or function) should look a little like this: =importhtml(“”,”table”,”1″)

3. Press enter, and watch the data stream into your spreadsheet, as in the image below (click to enlarge)

Basic scraping


4. Note, your spreadsheet will last only as long as the website is up, so create regular back-ups or screenshots.

Importing data from an RSS feed:

Let’s take the Oxford Migration Observatory’s Press Release RSS feed url:

1. Type =importfeed into your first cell

2. Follow this with (“url”) – this should be the url of the page that you want the information from.

3. So my finished formula in this case is: =importfeed(“”)

basic scraping 2

Ta daaaa


On posing a question, and finding the data

Almost everything generates a trail of numbers. Success rates, call out frequencies – if nothing else there’s always a money trail to follow. Sometimes figures need to be ferreted out, (I’ll come back to tailoring FOI requests later), but as I flexed my fingers and started hunting, it became clear that there are reams of data that have been made publicly available, from government and researchers.

A good starting point for Brits is the UK National Statistics release calendar.

I’d recommend at this point subscribing to a free RSS feed reader service, like Google Reader. Many pages will have a little orange RSS feed icon in the corner. (Click below for expanded image)

An RSS feed icon, earlier

For pages of data releases that don’t offer such feeds, you can often create a manual RSS feed using tools like page2RSS.
My current google reader bundle contains a couple of ready-made RSS feeds from the Oxford University Migration Observatory and the Metropolitan Police’s crime figures pages, and one which I created from the Home Office’s migration statistics page.

But whilst this method of subscribing to raw data feeds will certainly spark off ideas and keep you abreast of the sort of information that’s out there, it does leave you at the mercy of whatever information happens to trickle into your lap.

So, here’s the first key point I’ve picked up about data journalism thanks to the indomitable Paul Bradshaw: data is useless unless you use it to tell a story. Data is a tool, journalism is the point. Beautiful animations and infographics, whether hard-hitting or otherwise, are wonderful journalistic tools precisely because they can hit you with a story in the ten seconds it takes you to glance at the image. To tell a story with the same impact in ten seconds in print, you need a tabloid headline. Data can have the same rapid, punchy impact, without compromising itself.

This does mean that a more proactive data journalism process will begin a little like this:
1. Begin with a hypothesis, a question, or a possible story idea which you can test. Give it a time frame. E.g: How many independent bookshops have closed in London this year compared to last year and the year before? Which borough had the highest number of armed police call-outs this year? Or how much funding has been cut for people with disabilities under the coalition government?

2. Find the data to answer this question. This might mean sending an FOI request or three.  It might also mean amending your idea if you find the data tells a different story to the one expected.Much data is publicly available; some is generated by research bodies. Combine journalistic research methods – ring around and ask questions.

Let’s take my last suggested idea about funding cuts for people with disabilities. I could send out my own FoI requests but a quick trip to What Do They Know, a website logging FOI requests and their responses for public use, reveals that someone else has already done some of the legwork for me. Hurrah. Mr Paul Taylforth has sent a letter to 123 councils asking what their response has been to the government’s consultation on the closure of the Independent Living Fund.

In view of the data I’ve managed to find, my hypothesis or question may have to be tailored down. Since my question was about changes to funding for people with disabilities, I could ask how many councils don’t support the government’s plans to phase out the Independent Living Fund. One of my next posts will cover how I get on in attempting my first piece of data journalism, using this information and question.

For now, here are a few tips on tailoring Google searches and FOI requests to get the information you’re looking for.

A few tips for advanced google searches:

Google searches can be made to more efficiently search for data using filetype:xls  or filetype:pdf. Try adding

Try adding: or or .uk or or or or org/ (for charities and nonprofits), or (most schools).

~ before a word will look for synonyms.

Clinical research groups tend to have “ccg” in the wider url. Search like so – inurl:ccg

Asterisks in a search can be replaced with anything. “revenue from * 2011” will bring up “revenue from mobile to 2011” or “revenue from gold in 2011

A few tips for Freedom of Information requests:

A helpful lecture from Heather Brooke, the Freedom of Information campaigner who was on the tail of MPs expenses long before the Telegraph broke the story, gave me a few pointers about getting the most from your FoI requests.

(Click to enlarge) An example of a good FOI request from Heather Brooke

1. Ask the organisation to send you the information in the format of an excel spreadsheet. It will make it much, much easier to deal with, and you’re within your rights to ask for this.

2. If they put you off with the excuse that it would cost too much to compile or photocopy and send you the information, ask to inspect the record. You’re within your rights to do this for most public bodies, from local government to LOCOG or the Metropolitan Police Authority. Here’s a good example of how to word a request asking for the information electronically first, but letting them know you’re happy to come in to inspect the record otherwise.

3. The cost limit for each FOI request is £600 for government department and £450 for all other public authorities – about 2.5 days work. This means that larger, more complex requests can easily take you above the threshold. One answer is to break your questions down – don’t ask more than a few in each request. Be specific in your questions and ask for the information with all the details – whilst leaving as little room as possible for the public body to say the information doesn’t exist in the form you’ve requested.

I sent an FOI request recently to the Health and Safety Inspectorate, asking for accident rates by year, for the last three years. Cue the standard response that the information didn’t exist in the format I’d requested and arranging it into that format would take them above the financial limit. I sent a record request making it clear that I was happy to receive the information in whatever format they held it, as long as it covered the period I’d outlined. I found that asking for the specific accident record for each incident uncovered a wealth of anecdotal evidence in addition.

4. Mention the 20-day statutory period within which you expect a response.