Skip to main content

Exploring City of London Councillors data

I’ve scraped data about all the City of London’s Aldermen and Common Councilmen and put the data up on a browsable and searchable website using Datasette. My code is all on GitHub.

(An aside… the City of London is a small local authority within London as a whole and has its own, er, special system of government. This video is a fun introduction to the City as an entity and the second part is good on its government.)

The City of London has a fair amount of data on its website about its Members (the term that combines Aldermen and Common Councilmen) but the presentation is a bit limited. While you can view a Member’s entries in the Register of Interests (like these), you can’t see all of the interests across all Members, or search them.

A page showing all freemasons

Now you can see, and search, all the registered interests and all the gifts received.

The interface is a little on the geeky side — I guess Datasette is a balance between something that is relatively easy and quick to set up, and potentially powerful, against something that isn’t confusing for the less technically minded.

You can see lists of the Members, committees and electoral wards, but that’s nothing you couldn’t do on the City’s own website. Now, however, you can search the list of gifts for mentions of “dinner”, for example, or the register of interests for all those mentioning “conservative” (the City is generally non-party political, in theory). Change the word in the “Search” boxes on those pages to find other things. Use an asterisk (*) as a wildcard.

Here’s a more complicated query listing all those who mention “freemason” or “masonic” in the register of interests. There are 22 people listing themselves as freemasons, all men. There are 125 Members in total, 26 of whom are women (based on me looking at their photos). Which means that 22% of male Members are freemasons. Is that more or less than you’d expect? Let’s not even get into that gender balance. While we’re at it, I’m estimating 9 of those 125 Members are from BAME backgrounds.

Back to the new website… we can also do more complex queries. By finding the ID of a committee on this page we can use it in this form to restrict our search of interests to the members of specific committees. (Confusingly, on this form % is the wildcard character.)

For example, we can see all the interests of the members of the Planning and Transportation Committee (ID 143). There’s a similar form for searching gifts Members have received, which lets us see the same committee’s gifts.

Whether there’s anything interesting in all that is another matter, but making this information more easily browsable and searchable is the only way to know.

Some other lists, suggested by readers:

Technical notes

For those interested in how this was done…

A page showing gifts to the Planning and Transportation Committtee

The README on my GitHub repository describes how to run the code and set up the site using Datasette and Zeit, a sort of Heroku-like, cloud-based, web hosting platform Glitch.

I first had to get get all the data from the City’s website. Previously I would have used the python module Beautiful Soup to scrape the data from pages but this time I tried requests-html and liked it a lot. It allows for more jQuery-like selecting of HTML elements which made things fairly painless, although I still had to resort to regular expressions a couple of times.

I assume the data is held in a database at the City’s end but some of the data is, I imagine, entered directly from freeform text fields (or paper forms…?). For example, the dates when gifts were received are written in a variety of formats. I used dateparser to turn most of these into a standardised date format, although this wasn’t always possible.

I put all the data into JSON files as a first step because this seems, to me, the most useful general format as a basis for other things. These files are included in the GitHub repository.

I was aiming to use Datasette so I needed to make an SQLite database from this data. I adapted the convert_xml_to_sqlite.py script from Simon Willison’s register-of-members-interests code (Simon created Datasette) which saved me a lot of time and googling. It was quite refreshing to use SQLite as a database; because it’s just a file it’s very easy to throw one database away and create another, back them up, etc.

I then installed the Datasette module and it worked straight away, letting me access the SQLite database in my web browser. It’s a relief when some things Just Work.

But that was only the database sitting on my computer. Getting it online was a bit trickier, because I’d never used Zeit and I’d never used Docker which is the platform I’d have to configure to run my code on Zeit. It took me a frustrating period of cargo-cult copying-and-pasting settings around before I got something that seemed to work, not helped by Zeit’s free plan rate-limiting logging output; I’d often end up with a deployment failing but being unable to see why. I got there in the end.

[Note: Since this was written, I’ve since moved the site from Zeit to Glitch and updated all the URLs to point to the new version.]

It took about 2½ days to get everything working. The first day was spent getting the first round of scraping the data working. The second day was spent re-writing much of that code to organise the JSON files differently, fetching the data about committees, and making the SQLite database. And half a day was trying to deploy the site to Zeit with Docker. At least that bit would be quicker next time.

Many thanks to Simon Willison for Datasette and for some assistance with getting things set up.

UPDATES:

  • 11 May 2018: Added lists of members ordered by number of committees served on, and number of gifts received in the past year.
  • 12 May 2018: Refined the query listing freemasons; added queries listing members by number of committees; added queries listing conservative, labour and liberal interests.
  • 29 May 2019: Changed URLs from Zeit to Glitch.

Mention this post

If you’ve mentioned this post somewhere, enter the URL here to let me know: