Skip to main content

Invalidating items from Django’s cache

I spent a day struggling to work out how to invalidate items from Django’s cache. I couldn’t find a lot of detail about how the caching works, so I poked around, got confused, and this is what I found.

If you notice anything that’s wrong here do let me know by email or on Twitter.

Short version

  • Django’s per-site cache uses the values of cookies (among other things) to generate its cache keys.
  • Which means you can only invalidate items from the cache if your invalidation code knows the same cookies.
  • Also, because of the cookies, per-site caching caches pages per-session, which isn’t necessarily what you expect it to do.
  • Also also, Django Debug Toolbar appears to add a cookie in variable positions in the list of cookies, which also changes the cache keys.
  • But Django’s per-view cache doesn’t use the values of cookies to generate its keys, which means it’s easier to invalidate items from the cache.
  • I’ve included the function I’ve ended up with for doing that.
  • And described how I use per-view caching on class-based views, without caching pages viewed by authenticated users.

Long version

On my website I’m using Django’s built-in caching. I wanted to ensure that when I change the contents of a blog post, for example, I could invalidate the cache for related pages, so no one would see the old version.

It feels like there should be a relatively simple way to do this in Django but there’s not, so it’s a case of piecing together functions from Stack Overflow and gists of various ages to make some code that works. I’ve ended up with a function (shown at the end of this post) that accepts a URL and the corresponding page is removed from the cache.

However, getting this to work on my local development server was confusing, and I couldn’t find a lot of detail about the practicalities of how caching worked. In case it’s ever useful to someone googling, here’s what I found.

The set-up

On my development site I’m using Django’s Local-memory caching but I assume this all works similarly with Memcached or other backends.

Django has a few different ways to cache things:

I only had the first, the per-site cache, enabled to begin with because it seemed simplest.

The problem

No matter how I fiddled with that invalidation function for expiring a page’s cache, it seemed to make no difference; the page in my browser only updated with new content after its cached version expired, not immediately. So I poked around a bit.

Cache keys

Each page in the cache has a key to identify it. The keys for my pages looked something like this:

views.decorators.cache.cache_page..GET.e3b8e0e5aad3ec44d515ffc0ca062701.2e365c6612b1817ad03e9247c24a217e.en.UTC

(I’m not sure why they start with views.decorators.cache.cache_page when this is the per-site cache, and not a per-view cache that uses the cache_page decorator, but still.)

Comparing the key that my invalidation function used, when looking for what to invalidate, with the keys Django created when caching the page, I noticed they were different. Most of the key was identical, but there are two long hashes in the key, and the second was different.

My invalidation function created a key that ended like this for a specific page (looking at only the last four characters of the hash for brevity):

...427e.en.UTC

But when refreshing the page, Django was looking for cached versions using two different keys, that ended like:

...c284.en.UTC

or:

...5c60.en.UTC

First, it was annoying that I couldn’t work out the correct hash for the page — so I couldn’t invalidate it.

Second, it was odd that the same page appeared to be cached with two different keys, apparently used at random.

Why the keys are different

Poking deeper, I found that the second hash in a key is generated using values from the request’s header (it happens in _generate_cache_key() in django.utils.cache.py; the current code on GitHub).

In my case, as I refreshed the page, it was looking at the value of the HTTP_COOKIE header. And this alternated between (truncating the csrftoken value for brevity):

djdt=show; csrftoken=eT7o...TjKL

and

csrftoken=eT7o...TjKL; djdt=show

The same cookies, but in a different order. Which meant that the hash was being calculated differently for each one. The djdt cookie is set by Django Debug Toolbar.

I manually set a few other cookies and they always appeared in the same order, relative to the csrftoken cookie and each other. So maybe the position of that djdt cookie depended on (waves hands vaguely) some JavaScript or something? I don’t know. But that was the only cookie that seemed to change position.

With Django Debug Toolbar disabled, and that cookie deleted, the cookies remained in the same order and the page’s cache key now always ended:

...9633.en.UTC

Great! Stability!

Except… the key I was generating to invalidate the page still ended:

...427e.en.UTC

Which is different to the actual key, which uses cookies to generate that second hash.

To generate the same key I’d need to use the cookies. But these are only available from a Request object. And my invalidation function doesn’t have access to that, because it could be used anywhere. For example, it might be run from a Post model’s save() method.

Trying the per-view cache

So, my next step was to disable the per-site cache and enable the per-view cache for this particular view. I did this and now the page was cached with a key that ended:

...427e.en.UTC

Success, really! It’s the same as the key I was generating in my invalidation function. The per-view cache keys appear not to include the values of cookies. And, yes, when I called my function when save()ing a Post, the refreshed page was generated afresh with the new content (and then cached again).

Combining per-site and per-view caching

I wondered what would happen if I turned on per-site caching as well as having the per-view cache enabled for this view. Refreshing the page I could see that Django used a cache key that ended like:

...9633.en.UTC

So it seems like the per-site cache has precedence? And so if you want to be able to invalidate specific pages from the cache like this, you should not use the per-site cache.

Authenticated users

One other thing to be careful of is using caching when your site has authenticated users — if your publicly-facing content changes for a logged-in user, you don’t want to cache their view of a page and then show it, from the cache, to a standard user. Or vice-versa.

Per-site caching tends not to have this problem because it takes into account the Vary: Cookie header. If you have CSRF protection enabled, which is default Django behaviour, and/or you’re using something like Google Analytics, this header is set and each user (i.e. session) will effectively have their own cache.

Yes, with per-site caching the cache is usually per session. You’ll cache a version of each page visited for each individual visitor. This is mentioned in this old release note.

But per-view caching doesn’t, by default, take this header into account (but you can add Vary headers). So all users, authenticated or not, will share the same single cached version of a page. Which is a problem if authenticated users are supposed to see some unique-to-them content.

The simplest solution is probably to not cache a page if the user is authenticated. If a large amount of your traffic is authenticated users, this won’t be a great solution… I guess that’s something that template fragment caching is good for?

I use class-based views and use this mixin, scavenged from elsewhere (currently on GitHub here):

class CacheMixin(object):
    # In seconds:
    cache_timeout = 60 * 5

    def get_cache_timeout(self):
        return self.cache_timeout

    def dispatch(self, *args, **kwargs):
        if hasattr(self.request, 'user') and self.request.user.is_authenticated:
            return super(CacheMixin, self).dispatch(*args, **kwargs)
        else:
            return cache_page(self.get_cache_timeout())(super().dispatch)(*args, **kwargs)

Add it to a view like:

class PostDetailView(CacheMixin, DetailView):
    # Your code here.

Conclusion

I’ve stopped using the per-site caching now, because it seems problematic, for me. I’ve started adding that CacheMixin to views that I want to cache. And I can now invalidate any of those cached pages when I need to.

Expire view cache function

Here’s the expire_view_cache() function I’ve ended up with. Also on GitHub, which might be more recent.

def expire_view_cache(path, key_prefix=None):
    """
    This function allows you to invalidate any item from the per-view cache.
    It probably won't work with things cached using the per-site cache
    middleware (because that takes account of the Vary: Cookie header).
    This assumes you're using the Sites framework.
    Arguments:
        * path: The URL of the view to invalidate, like `/blog/posts/1234/`.
        * key prefix: The same as that used for the cache_page()
          function/decorator (if any).

    """
    from django.conf import settings
    from django.contrib.sites.models import Site
    from django.core.cache import cache
    from django.http import HttpRequest
    from django.utils.cache import get_cache_key

    # Prepare metadata for our fake request.
    # I'm not sure how 'real' this data needs to be, but still:

    domain_parts = Site.objects.get_current().domain.split(':')
    request_meta = {'SERVER_NAME': domain_parts[0],}
    if len(domain_parts) > 1:
        request_meta['SERVER_PORT'] = domain_parts[1]
    else:
        request_meta['SERVER_PORT'] = '80'

    # Create a fake request object

    request = HttpRequest()
    request.method = 'GET'
    request.META = request_meta
    request.path = path

    if settings.USE_I18N:
        request.LANGUAGE_CODE = settings.LANGUAGE_CODE

    # If this key is in the cache, delete it:

    try:
        cache_key = get_cache_key(request, key_prefix=key_prefix)
        if cache_key:
            if cache.has_key(cache_key):
                cache.delete(cache_key)
                return (True, 'Successfully invalidated')
            else:
                return (False, 'Cache_key does not exist in cache')
        else:
            raise ValueError('Failed to create cache_key')
    except (ValueError, Exception) as e:
        return (False, e)