Pepys' Diary data available to play with

A bit of a cross-post, but I assume there are plenty of people who read this site who don’t follow The Diary of Samuel Pepys closely, but still might be interested. Hello geeks! I’ve just made loads of the data behind the site available in JSON format, in the hope some people feel like exploring it.

Here’s a list of roughly what’s included:

  • The text of more than 3,000 diary entries, including footnotes and links to relevant Encyclopedia pages.

  • A list of the categories used in the Encyclopedia, with their structure.

  • All the data about the more than 4,000 topics in the Encyclopedia: names, descriptions, Wikipedia links, latitude/longitude, shapes, categories, etc.

  • More than 300 thumbnail images of people included in the Encyclopedia.

There’s a lot more information about the data in the long README file, which is also included in the 6MB zip archive of the data.

I’d love to see some interesting, beautiful and/or fun things done with this. We already have basic maps of places and graphs of how often each topic appears in the diary. And Matthew Somerville’s recent Pepys’ Shows site used an early release of some of the data.

There must be more that can be done with all this historic data though. If you have any ideas, let me know, or just plough straight in. And if there’s anything more that would be useful, get in touch.

Commenting is disabled on posts once they’re 30 days old.

22 Jan 2011 in Photos

22 Jan 2011 at Twitter

  • 6:43pm: Maybe, when this is all over, I should start publishing a diary written 343 years in the future.
  • 6:19pm: @louloulou I wouldn't want to wake him up - he looked so peaceful!
  • 6:15pm: Prompted by @louloulou, this morning's Hipsters Hogging Sofas:
  • 4:11pm: @mattsheret I'm not sure that's something to live by...
  • 4:08pm: Doing the usual weekly history hacking. Well, history hyperlinking.
  • 2:59pm: Rich Hall's history of westerns, 'How the West was Lost' is repeated on BBC4 tonight at 10.50. Is good. (aimed @genmon)
  • 1:37pm: @maxgadney I almost expected today's Guardian headline to be "It's the Guardian wot won it!"
  • 12:41pm: Hipsters Hogging Sofas, coming to a Tumblr near you soon.
  • 11:12am: @ianbetteridge Estate agents should have it as a feature: "3 receptions, 3 beds, two baths, 50' garden, freehold, Bunga Bunga room."
  • 2:08am: Phew, got all the Samuel Pepys data prepared and explained: @historyhackday Have at it!
  • 12:14am: @Jamiec Ha, yes, I was flicking through @WiredUK thinking "It's quite good actually," then found that article and put it down quick.

22 Jan 2011 in Links