In my previous post I wrote about cataloguing 28 years worth of my visits to see movies, plays, gigs, exhibitions, etc. In this post I’m writing about some of the code structure, the decisions I made, and the changes I made as I went along. It might be useful or interesting to someone else, or to me to remember why some things are how they are. Maybe the thought processes are interesting to people who don’t write code themselves.
The code is an “app” (a modular chunk of code) for the Django framework, called django-spectator. It’s in two parts: one for recording what books and periodicals I’ve read (used here) and one for recording events I’ve been to (used here).
The reading part was built to replicate some old PHP code I’d been using for years so I had a clear idea of how it should work. The events part was new and I ended up changing my mind about its structure while writing it. And then, once it was “finished” and I started using it, I realised there were still more things I hadn’t done well and needed changing. I hope no one else has been using the code during this time.
The code itself isn’t terribly advanced but I still find some of this interesting — the compromises and balances required in representing real-world objects and events in code, in a way that’s useful and, to the end user (me), usable. The balance would be different in many cases if this code was for a large commercial site, and/or for different users, instead of only being for my personal website.
Rather than litter this write-up with code samples, I’ll link to the code on GitHub, using the current commit at the time of writing, for anyone who wants more detail.
There are some things that remained fairly consistent throughout my changes, so I’ll describe them first. In the Django code these are all models, each one representing a database table. Each model represents a “thing” in the world, often an object, but maybe a single concept (like
Creator model represents an individual or a group. e.g. “William Shakespeare” or “Belle and Sebastian”. Someone/people who wrote something, performed in something, directed something, etc. (On GitHub.) Each
Creator has a
kind field that indicates if it’s an individual or a group.
(A “field” represents a piece of information about a particular thing. e.g.
Creators also have a field for
name. Each field is a column in a database table, like a column in a spreadsheet.)
If we were getting complicated we could have People and Groups as separate models, with a Group containing zero or more People. This might be useful if you went to see the group Ben Folds Five one year and then the next saw Ben Folds on his own — you might want to represent that they are linked, or that you had now seen Ben Folds twice, once as part of a group. Or you might want to record every member of a group you’d seen.
But this is more complex than is useful to me and so I conflated the two, which is good enough almost all the time. There are some odd things… Is “Nick Cave & The Bad Seeds” a group, or an individual and a group? I probably change my mind over this kind of thing from one day to the next, but it’s not a big deal.
Creators are shared between both the reading and events parts of the code. So I could see David Byrne in concert and read a book by him, and they’re both linked to the same
Creators have a single
name field, which avoids these problems with using firstname and surname and means it can also be used if the
Creator is a group. Because we might want to sort Creators-who-are-people by surname there’s also a separate field (on GitHub) that stores an automatically-generated “surname, lastname” style string in a manner that is good-enough-for-me but would, I imagine, contain a nightmare of edge cases in a real, international, application.
Venues have a
country field (because I wanted to see how many countries I’d been to events in), a
latitude and a
longitude (because I want to put these on maps), and an
address (which I lazily populate with some barely-useful information using a Google API and code similar to this). (On GitHub.)
Pretty simple. Too simple, as it turns out below.
Event model records a visit to see something on a specific date at, optionally, a specific
Venue. (On GitHub.) A
Venue is optional for an
Event because sometimes I knew when I’d been to something but I had no record or memory of where it was.
I can imagine adding an optional
time field in future, if only so that multiple
Events on the same day can be displayed in the correct order.
Event can be one of several
kinds: Cinema, Classical Concert, Comedy, Dance, Exhibition, Gig, Theatre or Other. This isn’t very flexibly done and, having input all my data, I now want to add “Talk”. Other people will have other requirements. Still, this lets us display particular
Event grouped together, which feels important.
Events vs Works
That’s all good but we’ve yet to link
Events. I realised there are two different ways this can be done.
The simplest, is for zero or more
Creators to be linked directly to an
Event. For example, if you go to see The Mountain Goats play live, you’re actually seeing The Mountain Goats right there, at the event. That’s easy. We can link their
Creator object to this particular
The only addition to this is that rather than linking
Events directly, I’m using a “through model” (
EventRole) that describes the link itself — how it should be described and what order it should be displayed in (on GitHub). So we can say that at this particular Event, The Mountain Goats were headlining, and so should be listed first, while Emmy the Great should be described as “Support” and appear down the bill.
However… if you go to see the movie Dunkirk you didn’t actually see the director Christopher Nolan or any of the actors, all of whom you might want to record in the event’s data, depending on your diligence/masochism. You could call the
Event itself “Dunkirk” and add Christopher Nolan to it, as we did with The Mountain Goats. But what if you go to see Dunkirk again? You could do the same with a second
Event but there’s no nice way of grouping both visits together other than by finding both
Events with name of “Dunkirk”. Which is OK, so long as you never see a play called “Dunkirk”, or that 1958 film of the same name.
So it seems we need a separate
Movie model. We could link
Creators (like Christopher Nolan) to this. And then link a
Movie to an
Event. So when you create a new
Event object you can say “it featured this Dunkirk
Movie“. We could then see all the
Events connected to that one
Movie. And we could list all the Christopher Nolan
Movies we’ve seen.
We’ll also add a
MovieRole “through model”, like we did with
Events, to indicate that, in this case, Nolan was the “Director” and should be listed ahead of anyone else. Very good.
Now, what about plays? Maybe we could do similar here and have a
Play model, with
Creators linked directly to that. This doesn’t quite work though. If you go to see King Lear you’d make a new
Play object and assign William Shakespeare to it as “Playwright”. And this was a production by the Royal Shakespeare Company so you could add them as a
Creator too. And then credit Anthony Sher as playing Lear. It’s all looking good…
…until a few years later and you go to see King Lear performed by a local school, so you create a new
Event, and add your “King Lear”
Play to it… and realise that while it correctly lists Shakespeare as the playwright, it also lists the RSC and Anthony Sher, neither of whom were at this performance in a school hall. Ah.
So maybe the
Performance of a
Play is a separate thing? The
Event would have a
Performance linked to it (with the RSC and Sher attached) and this would have the “King Lear”
Play linked to it (with Shakespeare attached).
A chain of
Play. This seems to work.
But then… what if you thought Sher was so good that you went to see the RSC’s production several times? It doesn’t seem quite accurate enough to list all these
Performances separately with no real relationship between them. So maybe we need a
Production could have several
Performances of the same
Production would have the RSC attached while the
Performances would have Sher attached, or his understudy if he was ill.
This seems reasonably accurate. It could be “better” though… what if you saw a version of King Lear that was adapted by someone else? Maybe you saw it translated into another language. So we’d need to have a way of having “versions” of the same
Play, each with additional roles (e.g. “Translator”).
So this could be improved, and made more detailed, but… No, stop! It’s already too much you fool!
If this was a larger website, with lots more data, maybe this would be required. But for my purposes it’s too complicated, and it gets too fiddly for inputting the data. I did start off with the
Play structure but that was too much for my needs and so I simplified things.
It’s not perfect, but it works OK for my needs. A compromise in favour of simplicity and ease-of-use.
Classical works and dance pieces
I don’t see many of these but I think I started off with a similar structure to
Plays, having performances of particular works (like Music for 18 Musicians). While this would be required for some websites (or still be too simple) it was overkill for me. So I simplified it to having a
ClassicalWork (e.g. with “Composer” Steve Reich) linked directly to an
Event (with the performers attached to that). And similar for
After simplifying plays a bit, this is where I was: An
Event could have one or more of these things attached to it, depending on whether it had a
kind of “Cinema”, “Theatre”, “Classical Concert” or “Dance”:
Each of those has its own “through model” lining Creators to it:
This worked fine, for my needs. But I realised it was still unnecessarily restrictive and fiddly for two reasons.
First, what if I went to an
Event that featured a performance of a classical work and then a piece of dance? In my system an
Event of a particular
kind could only have the matching type of work added to it. It seemed logical at first but it was an unnecessary restriction. So I removed it. Now, the
kind is more of a guide as to how we could split
Events up when displaying them. We can list “Cinema”
Events separately from “Theatre”
Events. And a “Cinema”
Event could feature a
DancePiece as well as a
Movie. That’s fine. No one will die.
Second, having simplified how I represented plays, classical works and dance pieces (removing the
Performance model between them and
Events) they were all pretty similar to each other, and to
Movies. It now seemed overly-complicated in the website’s admin pages to list forms for these different models separately. So I eventually ended up conflating them all into a single
Work model, which has its own
kind field (“Classical work”, “Dance piece”, “Movie” or “Play”). And a single
WorkRole “through model” associating
Creators to it. (On GitHub.)
This is much simpler. We have an
Event which can optionally have
Creators associated directly with it — people or groups who were actually there (e.g. musicians at a gig, actors at a play, film directors at an after-movie-discussion, etc.). And then the
Event can optionally have a variety of
Works linked to it, each of which can optionally have
Creators associated with them (e.g. playwright, director, etc.).
Looking back, it feels silly that I made things so complicated to start off with. This is the kind of rabbit hole you can end up going down when you want to accurately model the world. The real world is complex and often doesn’t map well to the more binary, logical world of code. It’s easy to over-do this process (particularly when working on your own!) and end up with a more “accurate” but unwieldy mess that’s hard to maintain, hard to work with, and unnecessarily fiddly for the end user to use. And it’s never quite right.
The new structure is certainly good enough for my needs, and it’s relatively simple to enter data. Finally.
Oh, but what about if you go to see a movie that’s based on a play? Isn’t that like a “performance” of a play? Or, what about if you see a filmed version of a live performance of a play? What even is that? A play? A movie? What is the
It’s never quite right. Compromise.
Venues earlier and they seem pretty simple. A name and a location. A place where
However, as soon as I started entering data from 28 years ago into my site I realised the biggest difficulty… Venues change over time. I’m not sure why I didn’t think of this initially.
If you’re entering data about a visit to see a movie at the MGM Trocadero in 1995 you want it to appear on the site as being at the MGM Trocadero. But later, when you enter data about seeing a movie at the same cinema, that’s now known as the UGC Shaftesbury Avenue, you want that visit to say “UGC Shaftesbury Avenue”. And you don’t want the previous visit to change from being at “MGM Trocadero”. So, they have the different names, but they feel very much like the same
There is a slightly philosophical question here… What does it take to change one venue into an entirely new venue? Some options:
- When it changes name and branding. e.g. from “MGM Trocadero” to “Virgin Trocadero”.
- When it keeps the same name but has its interior reconstructed. e.g. it splits one screen into two or three.
- When it changes name and branding and has its interior reconstructed. e.g. Cineworld Shaftesbury Avenue becoming Picturehouse Central.
- When it’s roughly the same location, with the same company, but in an entirely different building. e.g. the RSC performed in the temporary Courtyard Theatre (on the site of The Other Place) for a few years while its nearby theatres were redeveloped. That was then replaced by a new The Other Place on the same spot.
- When it temporarily moves to a new location. e.g. the Almeida theatre “moved” to near King’s Cross when its permanent building was being renovated.
- When it permanently moves to a new location. e.g. The Odeon in Colchester was replaced with a new Odeon Colchester round the corner from the previous version in 2002.
I’ve tended to treat the first three of these cases as being the same venue over time. Even the radical transformation of the Trocadero cinema into Picturehouse Central feels, just about, like going to the same venue. On the other hand options 4-6 feel like they create separate venues.
So, given options 1-3, we need a way to keep track of a
Venue‘s different names over time. I didn’t think of that when I started.
The “proper” way to do this would be, I think, to have a separate model like
VenueName which would have fields for
end_date and a link to a
Venue object. Any
Event that occurred at a
Venue would be displayed using the
VenueName related to the date.
However, this requires knowing exactly, or even roughly, when a venue changed its name, and adds to the complexity of data entry. This is one of those things it’d be worth doing for a bigger site, with more users, that had to be more robust. But for me it seemed like overkill.
I ended up with another compromise. Each
Event has a field for
venue_name. When creating a new
Event the current name of the linked
Venue is copied to that field. So, assuming
Events are entered in chronological order you can change the name of a
Venue as you work forward through time, and each
Event will bear the “current” name. No matter how often you change a
Venue‘s name, the
venue_name saved with existing
Events won’t change.
The only downside is that if you need to change the historical venue name for several
Events in the past, it requires manually editing all of their
venue_names, rather than only editing (or creating) a single
VenueName object with the relevant dates. But, again, it’s Good Enough for my needs, and keeps things fairly simple. You can see the name changing over time on the Picturehouse Central page.
Screens and theatres
There’s yet another question about what a
Venue is… is it an entire building? Or a particular screen, theatre, hall, etc. within the building?
For example, is the National Theatre a single venue or are its Olivier, Lyttleton and Dorfman Theatres separate venues? This seems like more of an issue for theatres than cinemas — a performance in the Barbican’s 1,156 seat Theatre is a very different experience to one in its 200 seat Pit. But you could say similar for cinemas — you may want to record whether you saw a film in a cinema’s giant main screen or its poky 30-seater.
And, close to home for me, is the Barbican’s Cinema One (large, in one building) the same venue as the Barbican’s Cinemas Two and Three, which are in a connected-but-different building up the road? They have the same branding and website but feel quite different. And then are those Cinemas Two and Three a different venue to the Cinemas Two and Three that were in a different Barbican building again until they closed a few years ago?
I don’t have good answers for any of these. It seems even less clear-cut than the “When does a venue become a new venue?” question. Sometimes I’ve made separate
Venues for things like this, other times I’ve created one
Venue and noted on each
Event which theatre it was in (e.g.).
That’s how I got to where I am with all this. The site works fine and I love having all that historical data in it. The compromises I’ve reached work for me and aren’t too annoying. For other uses there would be different compromises required.
There are still a few things that, inevitably, aren’t quite right, aside from everything mentioned above:
The “reading” and “events” parts of django-spectator are separate, only sharing the concept of
Creators. Which means that if I read the print version of King Lear and then see the play performed, these are separate things in the database: a
Work(play). It’d be nice for these to be the same thing, or related somehow.
If I go to see several short plays at one
Event, I can only attach
Creators to either each individual
Playor to the
Eventas a whole. There’s no way to indicate which
Playthey were acting in, or directed. (e.g.) I’d need the more complicated
Playstructure I retreated from above.
Another part of my site records every time I listen to a piece of music, via Last.fm using my django-ditto code. It would be nice if a
CreatorI’ve seen live could be linked to the occasions I’ve listened to the music.
If I go an art exhibition this is currently an
Eventwith a title and some optional
Creators attached. And so if I go to the same exhibition multiple times there’s no direct relation between these visits. Maybe an exhibition should be a new
Workwith its own
Creators (e.g. “Artist”, “Curator”, etc.)?
These are all edge cases — there are always edge cases — and they don’t bother me too much. Yet. But they’re the kinds of things I’d bear in mind if I was making something like this for different people with their own uses and needs.
As it is this is all working, and full of data, and is very pleasing.