Decades-long projects

As I wrote last week the Diary of Samuel Pepys project has kicked off again for another almost-decade of daily publishing. What’s wrong with me? Or, more practically, what did I think about when starting a ten-year project all over again?

The first time round, the most tedious, soul-destroying, part of the whole enterprise was adding all the HTML links, 4,800 of them, into the diary texts. (People often say I “typed in” the diary entries, which isn’t the case; the text was already scanned into Project Gutenberg. I’d never have started if I had to type it all in as well.) Given that this process was now complete, there wasn’t much reason not to restart the diary from the beginning. Restarting only involves having the site’s front page and RSS feed automatically update daily with “today’s” diary entry.

Of course, I couldn’t let it be that easy.

The original website was made with Movable Type (MT) which, in 2002, was about the only option for making such a website if you didn’t want to code the entire thing from scratch. There were other tools for running weblogs but I needed something that could cope with what I thought would be some light customisation, and MT’s self-hosted code made that possible.

Over time any website built with a blogging tool — MT, WordPress, whatever — will, if it’s anything more than a very simple, single blog, become an unwieldy mess. The Pepys’ Diary site was no exception, growing from two inter-linked weblogs into six weblogs, templates containing Perl and SQL, custom PHP-based admin tools, and a web of connections that made it increasingly complex to add new features. All built on a platform whose future has seemed precarious for several years.

When I first started the site I didn’t really think through what it meant to begin a site with a built-in ten year lifespan, either in terms of the work involved or the technology’s likely lifetime. I didn’t need to live through much of the project before I realised “starting a website” was exactly what it says. If you start a website you’re only at the very beginning, and most of the work is ahead of you. Celebrating a launch is only the celebration of a birth; the bigger achievement is to have a successful life. This looming thought has, unfortunately, put me off starting many more projects over the past decade, because I’m always considering what’s ahead.

This time round I had a better idea what I was letting myself in for and what issues I should consider. Running the website for another ten years seemed feasible in theory, what with all the ongoing manual labour dealt with. But the practice — living with that clunky, flaky, slow technology for another decade — wasn’t something I looked forward to.

When 2012’s client work finished in December, and I made the decision to kick the project off again, I could see only one course of action: forget taking a much-needed holiday away from computers. I should, instead, re-write the entire site.

So far I’ve spent about ten days writing a new Django site to replace the old one. I was aiming for a minimal viable product of displaying all the existing data, exported from the old MT database but, for now, leaving out many of the nicer additions. I mostly hit my target, and have probably got the first 80% of the site up and running. There’s another 80% of features still to do which will probably take a similar amount of time. And then there’ll be the inevitable third 80% of bug fixes, enhancements, design improvements, and other features I’ve forgotten.

The code, and the data (without users’ comments), is all on Github, as is the simple script I use for publishing @samuelpepys’ tweets.

The process has been reasonably pain-free so far. It helps that I’m merely copying the functionality of an existing site — I know exactly what it needs to do and I don’t have to spend any time agonising over how things should work. My Django skills have also improved a huge amount over the past year, and the frequency with which I run into a brick wall of things not working how I expect has drastically decreased.

I also decided to abdicate most responsibility for the design of the site to Twitter Bootstrap. With limited time available it made sense to focus on the content and features rather than prettiness, and Boostrap has made getting a reasonable-looking site up and running much quicker. It’s not ideal — the site is a pretty characterless home for old Samuel Pepys at the moment — but that can be improved over time. At the moment, it works.

Although I am, inevitably, slightly regretting the task I’ve set myself — that’s January gone — it’s a relief to be wrestling less with Movable Type. I now have a custom-made site that feels much better. Linked parts of the site are joined in a sensible manner rather than with Perl sticky tape and PHP string. There are downsides: the old website was mainly static, MT’s main selling-point really, which was well suited to a large, rarely-changing website. I’m now suffering the teething problems of a site with several thousand dynamically-generated pages. On the plus side, it doesn’t take me an hour to rebuild the website when I want to make a tiny change to each page.

Looking ahead, how will I feel about this Django backend in ten years’ time? I’ve no idea what the state of the platform will be in a decade. It feels like the database is in a better place now — although MT’s data structure is reasonably sensible, its need to be all things to all people makes it a little harder to extract some custom data than is ideal. It does seem like Django has a reasonable process of gradual improvement rather than sudden world-changing shifts that render all old code obsolete, which is reassuring for a project like this. But, whatever platform a site’s built on, there’s going to be plenty of maintenance required over the course of a decade.

On 25th January I’m going to be speaking at The Design of Understanding. I’ll be talking about the experience of running The Diary of Samuel Pepys, and what it means to think about running a website over decades. I don’t know how much individuals and companies habitually think about this. Is it possible to plan for how your online service will work over the next ten years, never mind longer? If you have thoughts about — or, even better, experience with — this kind of thing, do drop me a line.

Commenting is disabled on posts once they’re 30 days old.

7 Jan 2013 at Twitter

  • 10:53pm: @gwire @yoz Oh, I didn't realise that. Boo, what a shame, as that was simple enough for me.
  • 10:29pm: @suegyford Ooh, I’ve got that one still to watch! His character does seem like he’s walked in from a different TV show entirely.
  • 10:21pm: This episode of ‘Friday Night Dinner’ is packed with Mark Heap, so good:… Haven’t laughed that much at TV in ages.
  • 9:46pm: ‘Friday Night Dinner’ is a delivery mechanism for parcels of Mark Heap brilliance. It’s worth sitting through the packaging to get there.
  • 8:46pm: @designswarm That sounds good. Well done, whatever it is you’ve been approved for!
  • 7:01pm: @jwheare Yeah. I know nothing about jazz, so it's nice just having some way of narrowing it all down.
  • 6:30pm: @AllanPollock Follow the first link!
  • 6:11pm: "Nice." (No room on previous tweet for obligatory word.)
  • 6:10pm: A Spotify playlist of the "Jazz Starter Kit"… suggested by Mike Johnston on the Online Photographer…
  • 5:23pm: @holgate Eesh, good luck. My mail is elsewhere, but there's a forest of cranky old websites there to, er, re-plant.
  • 5:22pm: @holgate With Joyent? Textdrive? I have no idea who has responsibility for what. Grrr. Mess.
  • 5:17pm: @holgate Hmm, I can see no way to find out which option I opted for, although I do have new virtualmin log details.
  • 5:14pm: @holgate Yes. I'll have to look up what I asked for though. Bah.
  • 5:13pm: @holgate Eek, the "pffft" and date is a bit alarming. I still have no idea what and when is happening or what and when I need to do things.
  • 7:43am: @holgate Just finding it weird to get endless urgent-sounding updates about events that mean nothing to me; I worry they should.

7 Jan 2013 in Links

Music listened to most that week

  1. Ben Folds Five (15)
  2. Ben Folds (14)
  3. Count Basie (11)
  4. Harry Edison (11)
  5. SBTRKT (10)
  6. Coleman Hawkins (9)
  7. Kenny Burrell (9)
  8. Kenny Dorham (8)
  9. Robyn (8)
  10. Ginger Baker Trio (8)

More at