Bill Shupp Software engineer, photographer, musician, space geek

21Mar/110

Preparing your application for serving static content efficiently

(Reposted from the Empower Campaigns blog)

Most web applications have static content such as CSS, JS, images, etc.  To save on bandwidth costs and improve load times for your users, it's a good idea to tell the client to cache these items for a long period of time.  But even though your static content changes infrequently, when it does, the cached content needs to be invalidated.  In addition, high traffic sites often want to go a step further and take advantage of a CDN to offload static content delivery.  Below I'll share the method we used here at Empower Campaigns to accomplish both of these things.

Version your content

The first step to getting control of cache invalidation is to manage the version of your content, usually included as part of the URL.  In the past I've seen things like incrementing a single counter for your JS and CSS files that effectively changed the URL of these files and invalidated the cache.  But this updates the URL for ALL of the files that use that counter, which is not very efficient.  A better approach is to version each file (or bundle, if you deliver files in groups) so that you control the cache invalidation of each item individually.  One way to do this is with a hash like CRC32.  If the content is stored on a file system, such as JS, CSS, or image files often are, you could hash the file upon the first read, and store that hash in a cache that gets purged during a production code push.  For PHP users, APC is a great tool for this.  This reduces the hashing overhead to one time per web node in your cluster per code release.  If your content is stored somewhere outside of your code base, you might consider storing the hash of the content at write time.  For example, when I worked at Digg, we stored user profile images in MogileFS.  So when a user added or changed their profile image, we would hash the image at that time and store the hash with that user's profile data in Cassandra.  This was even better because the content was hashed only once.  The same type of thing could be done with locally stored content, it just depends how fancy you'd like to get.  Note that when constructing your versioned URL, avoid using the version in the query string, as some CDNs strip query strings (ex: media.yourdomain.com/v/8bba8b90/css/960.min.css rather than media.yourdomain.com/css/960.min.css?v=8bba8b90)

Minify/Compress your content

In a production environment, you want to deliver static content as quickly as possible.  Two ways to help with this is minification and compression.  For minification, we use Google's Closure for JS, and YUI Compressor for CSS, both of which are run during the deployment process.  You'll also want to enable compression in your web server as well, such as mod_deflate in Apache, to make the files as small as possible during transit.

Increase cache time

Once you have your content versioning in place, you'll want to increase the cache time for those files.  We use a cache time of one year, and set that at the web server level like so using mod_rewrite:

Use a dedicated domain name

If you want to be ready to move to a CDN at any time, consider putting your static content on a dedicated virtual host, such as media.yourdomain.com.  This way, you can simply point that domain to a CDN at any time, and you'll be up and running.  Also, this has the added benefit of allowing the client's browser to make more parallel requests for your content, as they often limit the number of parallel requests they'll make to any one domain.  By spreading your content over more than one domain, the client can make more requests in parallel, speeding up the the time for the document to be ready.  One other benefit is that by using another domain for your media, you can reduce the number of cookies sent with media requests, which reduces the size of each request. Our rewrite rules for ignoring version numbers in our media host ended up looking like this:

With the above steps in mind, we wrote a view helper that handles hashing and url writing called mediaURL().  It can also look at our config and determine if SSL URLs should be used, if a given page in our application requires it.  The view helper looks like this:

And the call to the helper from our view looks like this:

Conclusion

By following the steps above, you'll have static content that is dynamically versioned, compressed, and cacheable by the client.  And if you're serving it on a dedicated domain, you'll speed up overall page load times and be ready to provision a CDN if you ever need it.

Cheers,

Bill Shupp, Lead Developer (twitter: http://twitter.com/shupp)

Comments (0) Trackbacks (1)

Leave a comment