Greater London Authority 2010 survey summary
The Greater London Authority (GLA) have, over the recent few years, showed a commendable willingness to make their various datasets available for mass consumption by the public. Their vehicle for doing this is a new website called London Datastore at http://data.london.gov.uk/. To understand more about what London Datastore is all about it read the paragraphs below that I have copied from the homepage:
The London Datastore has been created by the Greater London Authority (GLA) as an innovation towards freeing London’s data. We want citizens to be able access the data that the GLA and other public sector organisations hold, and to use that data however they see fit – free of charge. The GLA is committed to influencing and cajoling other public sector organisations into releasing their data here too.
Releasing data though is just half the battle. Raw data often doesn’t tell you anything until it has been presented in a meaningful way. We want to encourage the masses of technical talent that we have in London to transform rows of text and numbers into apps, websites or mobile products which people can actually find useful
One of the datasets currently available is the Annual London Survey 2010. In their own words this dataset:
…is taken from a face-to-face survey of 1,490 residents of Greater London, undertaken in early 2010 by BMG Research on behalf of the GLA. The questions explore areas of Mayoral policy and priority including policing and safety, the environment, transport, the Olympics and london life.
The data is available by demographic group, including gender, age, ethnicity and social class
and has been reported on the main London Government website at http://www.london.gov.uk/get-involved/consultations/annual-london-survey/2010 which gives a high-level sanitised view of the results. The underlying data has however been made available in a Microsoft Excel workbook at http://data.london.gov.uk/datastorefiles/datafiles/championing-london/gla-als-2010-responses.xls. This workbook is not easy to comprehend for a number of reasons:
- These are answers only, the questions that were being answered are not provided.
- The data is spread over multiple sheets. The site claims that this is due to Excel’s 255 column limitation which, by the way, is a fallacy.
- Some questions allowed multiple answers and the structure of these answers in the workbook does not make them easy to consume
- There is no analysis or visualisation of the raw data (i.e. no aggregation and no charting)
These are not intended as criticisms per se. Providing data in its rawest form is absolutely the right thing to do as it means folks like me can take it and add value to it. With that in mind I have condensed the contents of that workbook into a new workbook that you can view online (only a web browser required) at http://cid-550f681dad532637.office.live.com/view.aspx/Public/BlogShare/20101217/gla-survey-2010-responses%20-%20rework.xlsx.
You can also download the workbook and view it in your own copy of Microsoft Excel (you will need Microsoft Excel 2010). Downloading has the advantage that you can drag-and-drop data in and out of the pivot tables thus providing true analysis capability.
This new workbook contains 2 pivot tables that aggregate the data to make it meaningful. Those pivot tables use a series of slicers that enable you to chop and change the data as you see fit and discover insights not included in the official report.
Please let me know if this is at all useful and if it is please spread the word.
Disclaimer: I had to guess at what some of the questions were based on the given answers. Where the question was not obvious I did not include it.