April 7, 2006, 9:27 AM ET
How I'm using Amazon S3 to serve media files
As traffic to chicagocrime.org has steadily increased, I've been looking for ways to tweak the site's performance. The site runs on a rented dedicated server with Apache/mod_python, PostgreSQL and Django. (I'd love to bite the bullet and buy proper servers but haven't done so yet. Donations are welcome!)
One thing that's always bugged me is that chicagocrime.org's Apache instance serves both the dynamic, Django-powered pages and static media files such as the CSS and images. It's inefficient for a single Apache instance to act as both an application server (mod_python) and a media server. A bunch of Apache configuration tweaks can improve performance of one aspect of serving but are somewhat detrimental to the other aspect. For example, using the KeepAlive directive improves Apache's media-serving capabilities, but KeepAlive is detrimental in a server arrangement that mainly churns out dynamic pages. So if a single Apache instance does both media serving and dynamic page creation, you can't optimize for both cases.
(When I worked at LJWorld.com, we had the luxury of separate application, media and database servers, and we have a similar setup where I work now, but I can't afford separate servers for my little side projects.)
The solution hit me the other day -- I can just use Amazon's new Amazon S3 data-storage service to host chicagocrime.org's media files, so my own Apache server can focus on serving dynamic pages. S3 is very cheap -- 15 cents a month for each gig of storage (and I have only 936 K of media files) and 20 cents per gig bandwidth. That's peanuts.
It was easy to get this working; took less than an hour total. Here's what I did:
First, I signed up for an Amazon S3 account. Do that by clicking "Sign Up For Web Service" on the main S3 page. When you sign up, you get two codes: an access key ID and secret access key.
Next, I created an S3 "bucket" for my chicagocrime.org media files. An account can have multiple buckets. As far as I can tell, it's just a way of keeping your S3 stuff in separate containers. I did this by using the free S3 Python bindings. Just download the file, unzip it and put the file S3.py somewhere on your Python path. To create a bucket named 'mybucketname', do this:
import S3
conn = S3.AWSAuthConnection('your access key', 'your secret key')
conn.create_bucket('mybucketname')
Next, I wrote a Python script that uploaded my media files to this bucket and made them publically readable. S3 has a bunch of complex authentication stuff, but all I wanted to do was use S3, essentially, as a Web hosting service. Here's the script I used, and here's how to use it:
$ cd /directory/with/media/files/ find | python /path/to/update_s3.py
The script is kind of cool because it uses Python's mimetypes to determine the content type of each file in order to pass that to the S3 API. Otherwise it's pretty straightforward.
Finally, it was just a matter of changing my chicagocrime.org templates to point to S3's URLs rather than my own URLs. That was a snap, thanks to Django's template inheritance and includes.
Now chicagocrime.org's media files are served directly off of S3, at a cost of 35 cents a month, and my Apache is happier.

Post a comment:
Comments on this entry are closed.
Don't see any comments? That's because my Web hosting provider has made a server upgrade that broke the commenting feature on this site. I'm working to restore that; please check back later.