I found some recently released census data on the proportion of resident adults who usually cycle to work: https://www.gov.uk/government/uploads/system/uploads/…/cw0901.xls
…and decided to have a go at mapping it.
The data is grouped by where those asked live, (rather than where they work or cycle), and shows both the numbers and the percentage.
I stripped out the columns that I wanted and pasted them into a fresh spreadsheet and tidied it up a little.
After some hunting I managed to find the second, vital component of a Google fusion map – a KML file that contains the geographical coding for Google maps. To create any Google map, you have to merge your data with a KML file, so that each place name corresponds to a bit of geographical coding.
I upload the KML file to Google Drive and am ready to do a final cleaning and sorting of the data.
The file contains the same LA codes found in the ‘cycle to work’ data, so I add this column to my new spreadsheet to make the process of knitting the two together, easier. I take out the ‘England’ total at the bottom, at the same time.
To merge two fusion tables, one column in each has to match precisely. This can be a pain when using place names. I once had to match one table in which names were written: “City of Kingston” with a KML file that instead contained: “Kingston, City of”.
This means lots of manual chopping and changing in Excel. LA codes (or their equivalent), nice, simple and precise, make the process much easier.
Using the LA codes to knit the two should also solve another problem. The KML file, it turns out, doesn’t contain all of the subcategories of local authority that the ‘cycle to work’ data contains (in grey) but it does contain some of them.
Hopefully, any rows with LA codes that don’t match up won’t appear on my merged file, meaning I won’t have to go through the ‘cycle to work’ file manually, ensuring that each area contains the same level of data that the KML file contains.
The next thing is to tidy the data up a bit by putting all of the place names into the same column.
And then it’s time to upload it as a fusion file!
To do this, go to Google Drive, hit ‘create’ and then ‘fusion file’, browse for your file and check that it all looks okay before uploading it.
It all looks good, so I proceed with the merge.
You should always merge from your data file, (not your KML file) to avoid messing with the pure KML loveliness.
Choose the column which corresponds in each file – the LA codes in my case – and then select which columns from both you want to include.
Then check that all looks fine, and that there are no rows lacking the proper KML coding that should have coding.
In my case, my plan to let the merge weed out any rows with LA numbers from my ‘cycle to work’ data file that didn’t correspond to an LA number in the KML file, worked perfectly. All the geographical districts that were too detailed or not detailed enough were weeded out, and fell to the bottom:
This only worked because the census ‘cycle to work’ data file contained several layers of geographical information for each actual area, meaning that no gaps would be left – all I needed to do was weed out the layers of geographical information for a particular area that didn’t match the KML file’s level of geographical information.
I then hit ‘map of geometry’ and my beautiful map appears, showing clearly and visually exactly which parts of England are cycle mad and which parts aren’t:
To make the map tell a story, I hit the drop-down arrow on ‘map of geometry’, and go to ‘map style’.
I then go to ‘polygons’ and to ‘fill colour’ and then to ‘buckets’ where I select the ‘number of people who cycle to work’ column to display on the map.
My first attempt at showing the gradient doesn’t quite go to plan:
I use an even range, (the one suggested by google maps in the picture), and what I get is the yellow monstrosity you see in the background. It turns out that in most places, very few people cycle to work.
What this does not make for is a very visually interesting graph.
The first colour only represents the places where just 1% of people cycle to work.
The second colour represents between 1% and 2%
The third, between 2% and 5%
And after that all the way to 10%
The result is much more attractive, and tells a more meaningful story.
It’s clear, for example, that a small area between Leeds and Sheffield – Barnsley -is the place in the country where the smallest number of people cycle to work.
A better way to do this might be to widen the scale with more buckets, to better reflect the full range at the bottom of the scale without making it uneven.
Colorbrewer is a useful tool for this, especially if you want to stay colourblind-friendly.
I select ‘colorblind safe’ and then ‘HEX’ to give me HTML codes that I can use in google. I then bump the number of data classes up to 9 – the highest it will go.
I copy over the codes, (starting after the 0x) to each bucket on the fusion map, (adding a hashtag at the start of each one).
The result is slightly less visually satisfying unfortunately, due to the relatively lack of results towards the top end of the scale. Barnsley vanishes as a result of being confined to a 9-colour scale, (making the lowest category 0-2%) when one of the biggest differences across the country is between 1% and 2% of people cycling.
But it does make the few differences in the top end of the scale clearer, including the fact that more people cycle to work in Hackney (9%) than anywhere else in the country, followed by York (8%) followed by Cambridge (7%).
An 18-bucket colour scale would be the ideal solution, and a bit more time spent on colour brewer would probably help achieve that.
But for now, I’ll settle for the above!