Jamie Thomson

Thoughts, words and deeds

Greater London Authority 2010 survey summary

with 7 comments

The Greater London Authority (GLA) have, over the recent few years, showed a commendable willingness to make their various datasets available for mass consumption by the public. Their vehicle for doing this is a new website called London Datastore at http://data.london.gov.uk/. To understand more about what London Datastore is all about it read the paragraphs below that I have copied from the homepage:

The London Datastore has been created by the Greater London Authority (GLA) as an innovation towards freeing London’s data. We want citizens to be able access the data that the GLA and other public sector organisations hold, and to use that data however they see fit – free of charge. The GLA is committed to influencing and cajoling other public sector organisations into releasing their data here too.

Releasing data though is just half the battle. Raw data often doesn’t tell you anything until it has been presented in a meaningful way. We want to encourage the masses of technical talent that we have in London to transform rows of text and numbers into apps, websites or mobile products which people can actually find useful

http://data.london.gov.uk/

 

One of the datasets currently available is the Annual London Survey 2010. In their own words this dataset:

…is taken from a face-to-face survey of 1,490 residents of Greater London, undertaken in early 2010 by BMG Research on behalf of the GLA. The questions explore areas of Mayoral policy and priority including policing and safety, the environment, transport, the Olympics and london life.

The data is available by demographic group, including gender, age, ethnicity and social class

and has been reported on the main London Government website at http://www.london.gov.uk/get-involved/consultations/annual-london-survey/2010 which gives a high-level sanitised view of the results. The underlying data has however been made available in a Microsoft Excel workbook at http://data.london.gov.uk/datastorefiles/datafiles/championing-london/gla-als-2010-responses.xls. This workbook is not easy to comprehend for a number of reasons:

  • These are answers only, the questions that were being answered are not provided.
  • The data is spread over multiple sheets. The site claims that this is due to Excel’s 255 column limitation which, by the way, is a fallacy.
  • Some questions allowed multiple answers and the structure of these answers in the workbook does not make them easy to consume
  • There is no analysis or visualisation of the raw data (i.e. no aggregation and no charting)

These are not intended as criticisms per se. Providing data in its rawest form is absolutely the right thing to do as it means folks like me can take it and add value to it. With that in mind I have condensed the contents of that workbook into a new workbook that you can view online (only a web browser required) at http://cid-550f681dad532637.office.live.com/view.aspx/Public/BlogShare/20101217/gla-survey-2010-responses%20-%20rework.xlsx.

You can also download the workbook and view it in your own copy of Microsoft Excel (you will need Microsoft Excel 2010). Downloading has the advantage that you can drag-and-drop data in and out of the pivot tables thus providing true analysis capability.

This new workbook contains 2 pivot tables that aggregate the data to make it meaningful. Those pivot tables use a series of slicers that enable you to chop and change the data as you see fit and discover insights not included in the official report.

Please let me know if this is at all useful and if it is please spread the word.

@Jamiet 

Disclaimer: I had to guess at what some of the questions were based on the given answers. Where the question was not obvious I did not include it.

About these ads

Written by Jamiet

December 18, 2010 at 1:49 pm

Posted in Uncategorized

Tagged with

7 Responses

Subscribe to comments with RSS.

  1. Hi Jamie,
    I am involved in the Datastore team at the GLA and was involved in helping prepare the ALS data to go on the site. It’s great that you have shown an interest in this data and are publicising it to others. Sorry to hear that you have concerns about the data, hopefully I can put you at ease on at least a couple of the issues.

    1. The questions that were asked for the survey are all listed in a separate CSV file on the site http://data.london.gov.uk/datastorefiles/datafiles/championing-london/gla-als-2010-variable-list.csv. As some of the questions are so long it would be tricky to include them as column headings so we have tried to use short column headings which can then be linked back to the full question in the accompanying file.

    2. We aim to try to publish all our datasets in a non-proprietary standard wherever possible and have provided the ALS data in a CSV file containing all the variables in one place http://data.london.gov.uk/datastorefiles/datafiles/championing-london/gla-als-2010-responses.csv. Where we also provide Excel versions of the data we are restricted by the fact that the GLA uses version 2000/2003 of Excel so the 256 column limit in that version is a problem. Hence we thought it would be sensible to group the questions in similar categories.

    3. Your point about the lack of visualisation is true but does miss one of the key points of the London Datastore (and one you referred to at the start of your blog) which is to provide access to the underlying data so that anyone can do their own analysis or visualisation and isn’t restricted to the analysis that the GLA provides. Your work using the online version of Excel is a great example of this and we plan to add it to the site as an inspirational use that will hopefully give other people ideas for doing similar things themselves.

    Please feel free to get in touch with us at datastore@london.gov.uk if you have any further questions about any of our datasets.
    Regards
    Gareth

    Gareth Baker

    December 20, 2010 at 10:34 am

    • Hi Gareth,
      Thanks for coming by here to comment – I really appreciate it.

      I wouldn’t say I have concerns as such, I was only pointing out some issues with the dataset that I thought I could improve upon. Having said that I was not aware of the list of questions so I appreciate your providing the link.
      Regarding the use of Excel 2003 – I suggest you guys catch up to the present day :)
      My point about the lack of visualisation was not intended as a criticism and I apologise if it came across as such. On the contrary, I commend you for providing the data in its rawest form if for no other reason than it gives me the opportunity to add value to it.

      I would be extremely grateful if you could add a link from http://data.london.gov.uk/, that is far and away above what I would expect from you. Its very much appreciated.

      Regards
      Jamie

      Jamiet

      December 20, 2010 at 10:58 am

  2. I would add that even if everyone in the GLA had version 2007 or 2010, unless all potential users in London had the same version, they would still have to provide the spreadsheets with maximum 255 columns, otherwise the data would not be accessible to quite a number of Londoners still using old versions.

    Gareth P

    December 20, 2010 at 2:29 pm

    • Fair point Gareth. That’s why I published it using Office Web Apps – only a browser required!

      Jamiet

      December 20, 2010 at 2:33 pm

  3. hi Jamie, does any dataset report the location of the responders? cheers, daniele

    Daniele

    December 22, 2010 at 4:17 am

    • Hi Daniele,
      Unfortunately not, no. It occurred to me too that that would prove to be useful information. Having said that, with only ~1500 respondents I doubt there would be enough people per region to offer any worthwhile such analysis.

      -Jamie

      Jamiet

      December 22, 2010 at 6:57 am

  4. […] online if you wish to see it). As you may have gathered from previous posts on this blog and my less-SQLy-focused WordPress blog I am a big fan of collecting and tracking both personal and public data and session feedback lends […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: