Storing private files on Amazon S3 with Django and giving authorised users temporary access

If you’re using Amazon S3 to store your website’s files, and you want some of them to be private, only accessible to certain users, and particularly if you’re using Django, here’s how.

It took me the best part of a day to piece together the parts of this so I thought they should all be in one place for the next person. I can’t believe I found this so hard. I may have made mistakes, but it seems to work.

I’m going to assume you’ve already got your site set up to store static files on S3 (which is particularly useful if, say, your site is hosted on Heroku). You should already have:

Your media and static folders

This isn’t essential for the whole private-files-on-S3 gist of this post, but getting your media and static files to end up on S3 in separate folders is a little non-obvious, but very useful. So, a little aside to cover it… I’ve got this working nicely by using something like this Stackoverflow answer:

In your app create an s3utils.py file and put this in there:

from storages.backends.s3boto import S3BotoStorage

StaticS3BotoStorage = lambda: S3BotoStorage(location='static')
MediaS3BotoStorage = lambda: S3BotoStorage(location='media')

And then, in your settings.py, you’ll need something like this:

DEFAULT_FILE_STORAGE = 'yourproject.yourapp.s3utils.MediaS3BotoStorage' 
STATICFILES_STORAGE = 'yourproject.yourapp.s3utils.StaticS3BotoStorage'

AWS_ACCESS_KEY_ID = 'YOURACCESSKEY'
AWS_SECRET_ACCESS_KEY = 'YOURSECRETACCESSKEY'
AWS_STORAGE_BUCKET_NAME = 'your-bucket-name'

S3_URL = 'http://%s.s3.amazonaws.com' % AWS_STORAGE_BUCKET_NAME
STATIC_DIRECTORY = '/static/'
MEDIA_DIRECTORY = '/media/'
STATIC_URL = S3_URL + STATIC_DIRECTORY
MEDIA_URL = S3_URL + MEDIA_DIRECTORY

Those STATIC_DIRECTORY and MEDIA_DIRECTORY settings aren’t standard Django settings, but we need the MEDIA_DIRECTORY value when setting permissions on our private files a little later.

I think you’ll need to manually create the /media/ and /static/ directories in your S3 bucket. Then, if you run the collectstatic Django management command your static files should end up in http://your-bucket-name.s3.amazonaws.com/static/. And any files uploaded through FileField or ImageField attributes on your models should end up in http://your-bucket-name.s3.amazonaws.com/media/ . If those model attributes specify upload_to paths, they will be relative to /media/.

Making files private

By default, those media files are public — if you enter an uploaded file’s URL in your browser, you should be able to access it just fine.

Let’s assume that our model has two kinds of file, one public and one private. So in our models.py we have this:

from django.db import models

class MyModel(models.Model):
    ...
    public_file = models.FileField(blank=True, null=True, upload_to='open/')
    private_file = models.FileField(blank=True, null=True, upload_to='seekrit/')
    ...

Assuming you create the /media/open/ and /media/seekrit/ directories then the files should get uploaded there fine, but all currently publicly-accessible.

For our private files we need to set their permissions after upload to be private. To do this, I’ve ended up with a custom save() method on the model:

from django.conf import settings
from django.db import models

import boto

class MyModel(models.Model):
    ...
    public_file = models.FileField(blank=True, null=True, upload_to='open/')
    private_file = models.FileField(blank=True, null=True, upload_to='seekrit/')
    ...

    def save(self, *args, **kwargs):
        super(MyModel, self).save(*args, **kwargs)
        if self.private_file:
            conn = boto.s3.connection.S3Connection(
                                settings.AWS_ACCESS_KEY_ID,
                                settings.AWS_SECRET_ACCESS_KEY)
            # If the bucket already exists, this finds that, rather than creating.
            bucket = conn.create_bucket(settings.AWS_STORAGE_BUCKET_NAME)
            k = boto.s3.key.Key(bucket)
            k.key = settings.MEDIA_DIRECTORY + self.private_file
            k.set_acl('private')

If you upload a private file with that in place, then you should no longer be able to access it directly. e.g., upload a file called test_file.pdf and visiting http://your-bucket-name.s3.amazonaws.com/media/seekrit/test_file.pdf should get you an XML file containing an AccessDenied error.

UPDATE: (12 Feb 2015) Shamim Hasnath suggests creating a new field class to use for the private file, and (3 Oct 2017) Robert Rollins has kindly added an improvement:

from django.core.files.storage import get_storage_class

class S3PrivateFileField(models.FileField):
    """
    A FileField that gives the 'private' ACL to the files it uploads to S3, instead of the default ACL.
    """
    def __init__(self, verbose_name=None, name=None, upload_to='', storage=None, **kwargs):
        if storage is None:
            storage = get_storage_class()(acl='private')
        super(S3PrivateFileField, self).__init__(verbose_name=verbose_name,
                name=name, upload_to=upload_to, storage=storage, **kwargs)

You can then use this for the private_file on your model instead of the FileField() we used above:

private_file = S3PrivateFileField(blank=True, null=True, upload_to='seekrit/')

Shamim suggests that once you’ve done this you don’t need to do the set_acl() (in the save() method) to make the file private, because the field sets the default_acl parameter.

Allowing access to certain users

Now that we can upload private files, how do we allow certain users to access them? We need to create temporary signed URLs that let users access the file.

First, in your urls.py add a URL for linking to the files:

from django.conf.urls.defaults import *

from yourproject.yourapp import views

urlpatterns = patterns('',
    ...
    url(r'^(?P<pk>[\d]+)/secretfile/$', views.SecretFileView.as_view(), name='secret_file'),
    ...
)

This will let us link to files something like http://yourdomain.com/42/secretfile/, referring to the secret file on an object with a pk of 42. This is what you should use when linking to the file, e.g., in a template:

<a href="{% url secret_file pk=object.pk %}">Download</a>

Then, in your app’s views.py create the SecretFileView:

from django import http
from django.shortcuts import get_object_or_404
from django.views.generic import RedirectView

from boto.s3.connection import S3Connection

from yourproject.yourapp.models import MyModel

logger = getLogger('django.request')

class SecretFileView(RedirectView):
    permanent = False

    get_redirect_url(self, **kwargs):
        s3 = S3Connection(settings.AWS_ACCESS_KEY_ID,
                            settings.AWS_SECRET_ACCESS_KEY,
                            is_secure=True)
        # Create a URL valid for 60 seconds.
        return s3.generate_url(60, 'GET',
                            bucket=settings.AWS_STORAGE_BUCKET_NAME,
                            key=kwargs['filepath'],
                            force_http=True)

    def get(self, request, *args, **kwargs):
        m = get_object_or_404(MyModel, pk=kwargs['pk'])
        u = request.user

        if u.is_authenticated() and (u.get_profile().is_very_special() or u.is_staff):
            if m.private_file:
                filepath = settings.MEDIA_DIRECTORY + m.private_file
                url = self.get_redirect_url(filepath=filepath)
                # The below is taken straight from RedirectView.
                if url:
                    if self.permanent:
                        return http.HttpResponsePermanentRedirect(url)
                    else:
                        return http.HttpResponseRedirect(url)
                else:
                    logger.warning('Gone: %s', self.request.path,
                                extra={
                                    'status_code': 410,
                                    'request': self.request
                                })
                    return http.HttpResponseGone()
            else:
                raise http.Http404
        else:
            raise http.Http404

What does this all do? First we make sure we get the object from the pk in the URL. Then we want to make sure this user can access the file. The conditions are up to you. Here we’re making sure the user is logged in, and either satisfies some condition set in an is_very_special() method on the user’s UserProfile, or is a staff member.

If that’s OK, and our object actually has a private_file uploaded, then we set the full filepath — this is why we had to create that MEDIA_DIRECTORY setting earlier on, because it needs to be an absolute path, not relative to /media/.

We then generate the signed, temporary URL. Here the URL will be valid for 60 seconds — after that it will no longer function, so it can’t be passed on to anyone else. Once we’ve got this URL, we redirect to it, and the user should be able to access the file — it should appear in the browser, or start downloading, depending on the file type.

Except, that’s not quite all…

Setting the Bucket Policy

Just because we’ve got a signed URL, this doesn’t yet mean the file will download. The private permissions we set on it earlier still apply. We need to specify a policy that will let us bypass this with the signed URLs.

Go to your S3 console, select your Bucket, and right-click to show its Properties. You should see a link saying “Add bucket policy” (or “Edit bucket policy” if you already have one). A window should open, into which you should put something like this:

{
    "Version": "2008-10-17",
    "Id": "My Special Bucket Policy",
    "Statement": [
        {
            "Sid": "Allow Signed Downloads for Private Files",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::12345678901:root"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::your-bucket-name/media/seekrit/*"
        }
    ]
}

A few points on this…

  • No, don’t change the Version date from 2008-10-17.
  • I don’t know what the significance of the Id or Sid are.
  • In the Principal value, the 12345678901 shown here should be replaced with your AWS Account Number. This should be visible on the AWS Manage Your Account page, currently shown at the top-right. Remove the hyphens and put it in here.
  • In Resource put your actual Bucket Name in place of your-bucket-name and set the path to point to the folder your private files are in. The asterisk on the end means this policy applies to all the files in that folder.

This seems to work for me, although I can’t claim to understand it in great depth.

That’s it

And there we go. I found most of that via Googling, but none of it was all in one place and it took way too long to piece together. Hopefully it’ll be useful to others.

Bear in mind that I DON’T REALLY KNOW WHAT I’M DOING and may have got things wrong. If you spot anything that could be improved, please do let me know (email or Twitter).

Commenting is disabled on posts once they’re 30 days old.

26 Sep 2012 at Twitter

  • 8:50pm: @newspaperclub Yo!
  • 5:38pm: @moleitau Ooh, smashing!
  • 3:37pm: @stml Tell us new music! I’m having to play it all here and need the new!
  • 12:08pm: @stml The five, no six, main shades of off-white, according to Frasier and Niles: youtube.com/watch?v=rGA8z3…
  • 10:57am: @pixellent But you’d have to complain about foreign countries stealing not only your friends, but you too! (Hope you’re having a good time!)
  • 8:51am: @maxgadney Wait, I thought this was a conference solely for men wearing glasses! Bait and switch!
  • 7:02am: @matlock I thought one of the two technology companies he mentioned at the end might be Newspaper Club! Jolly good though.

26 Sep 2012 in Links

On this day I was reading

Music listened to most that week

  1. Akira The Don (30)
  2. Laurel Halo (23)
  3. Gang Gang Dance (22)
  4. Pinch & Shackleton (20)
  5. Janelle Monáe (17)
  6. Bill Nelson (16)
  7. Robyn (15)
  8. Mala (14)
  9. Sun Kil Moon (11)
  10. Hot Sugar (9)

More at Last.fm...