Distributing Static Content From CDN

why

  • Distribute content to end users with low latency and faster response times.
  • Ability to deploy new static content without having to re-deploy the entire application.

what

  • Identify a Content Distribution Network (CDN) solution that meets our needs
  • Setup origin server(s) to store the content that will be distributed via CDN.
  • Setup a distribution to register the origin server(s).
  • Reference all static content in the application with the distribution’s url.
  • Version static content to bust the cache both in the browser and CDN.

how

Before we get into implementation details, what do we mean by “static content”?  Here, we define static content as all the web resources that are not specific to a particular user’s session.  In our case, this includes css files, javascript files and all media files.

Identify CDN solution

Since most of our infrastructure is hosted in Amazon Web Services, the AWS CloudFront CDN was a logical default choice to try.  Like most of the AWS application-level services, CloudFront tries to hit a reasonable balance of the 80/20 rule:  it is low-cost, easy to operate, and covers most basic functionality that you want in a application service, but does not offer much in the way of high-end differentiating features.

Setup origin server(s) to store the content that will be distributed via CDN.

An origin server in CloudFront can be either a web server, or, as in our case, an AWS S3 bucket. We have setup an origin for each of our environments (dev, qa, prod). We then have a Hudson job (build) for each of our environments. The Hudson job checks out the corresponding branch from our git repo for the static content; processes it as mentioned in this post; updates a version.xml file with the Hudson build number; zips the file; and copies it to a S3 bucket. This bucket, however, is not the CloudFront origin. It is like a docking area to store all zipped build files from our various Hudson jobs. We have another Hudson job (deploy) that copies a given zipped build file to an origin bucket. More on that in a minute.

We have a different Hudson job for each environment because the static content could be processed differently based on an environment. For example: in our dev environment, we do not combine or minify our js files. In our qa environment, we combine but do not minify our js files. In prod, we combine and minify our js files.

Back to the Hudson deploy job mentioned above. This job takes two input params: the name of the zipped build file to be deployed and the environment to be deployed. It simply unzips the build file into a temporary directory and uses s3sync to upload the content to the appropriate S3 origin bucket for the given environment.  And from there, it is available for distribution via CloudFront.

In addition, in our dev environments, we use a Continuous Integration (CI) process, where our Hudson jobs send SNS messages to our the web server when a build is available.  The web servers pulls the static content build from the S3 staging bucket and then serves it directly from Apache.  This allows for a more targeted integration test of the static content without bringing the CDN mechanisms into the mix.

Setup a CloudFront distribution to register the origin server(s).

We have a CloudFront distribution for each of our origins.  A CloudFront distribution associates a URL with one (or more) origin servers.  Creating the distribution is easy via the AWS Console.  In our distribution, we force HTTPS access – since our containing pages are served via https, we want to ensure the embedded static content is as well, to avoid browser security warnings.

CloudFront does allow you to associate a DNS CNAME with your distribution, so that you can associate your own subdomain (like static.personalcapital.com) with your CloudFront distribution.  This is more user-friendly than the default generated CloudFront domain names, which are like d1q4amq3lgzrzf.cloudfront.net. However, one gotcha is that CloudFront does not allow you to upload your own SSL certificate.   So, you have to choose between either having your own subdomain or having https – you can’t have both (some other CDN services do allow you to load your own SSL certs).   In our case, we chose https, and use the default cloudfront.net URL.

Reference all static content in the application with the distribution’s URL.

We have an environment config property that stores the corresponding CloudFront distribution’s URL. This property is available for all server-side webpages where it is referenced as follows:

 <link rel="stylesheet" type="text/css" href="<%=staticUrl%>/static/styles/css/main.css">

We then needed to make sure that all our static content references its resources via relative URLs to keep them independent of the distribution URL. For example, an image reference in main.css would be as follows:

 background: url('/static/img/dashboard/zeroState.png')

which would resolve to the root URL of main.css which is the distribution URL. I would like to know if there is a better way to solve this.

All our javascript uses relative paths anyway because of “requirejs” so we did not have to make any changes there.

All other references to static resources were on the server side where they had the config property to correctly reference the static resource.

Version static content to bust the cache both in the browser and CDN.

All our static content have aggressive cache headers. Once a static resource is fetched by the browser from the server, all future requests to that resource will be fetched from browser’s cache. This is great but when a newer version of this resource is available in server, the browser won’t no know about it, until its cache entry for that resource expires.

To prevent this, we use a common technique called URL fingerprinting wherein we add a unique fingerprint to the filename of that resource and change all its references in the webpages (JSPs) with the new filename. The browser, as it renders the updated webpage, will now request the resource from the server since it treats it as an entirely a new resource because of its new name.

The Hudson build job mentioned above processes our static resources, versions them with the build number and also stores the build number in version.xml. The version.xml file is then used by the application to retrieve the version number and pass it onto web pages at run-time. This helps us achieve our second goal that of keeping our static (front-end) development independent from our sever (back-end) development. This is very powerful as it gives us the ability to change our static content any time, have it deployed to production and not worry about updating server webpages with the latest version number. Pretty neat ah !!

Versioning of the resources also helped us out a great deal with our CloudFront distribution. CloudFront distribution behaves very similar to how browser handles resource caching. It does not fetch a newer version of the resource from the origin server unless one invalidates the current resource in the distribution. This has to be done per resource and it has a cost too.  The CloudFront documentation offers more considerations regarding invalidation vs versioning.

There is one other workaround you could use to force CloudFront distribution to fetch the content from its origin server. Set the distribution to consider query strings in the resource urls. And then pass a unique querystring along with the resource url and it will force the distribution to fetch the resource from the origin server.

That is it !!

reads

2 comments

  1. Thank you for your info, though it has passed a lot of time since the article was published. But nowadays CDN is even more worth to use than previously.
    Personally i’d definitely recommend Edgecast CDN services for serious websites. This network is one of the leaders of the industry and even such a big names like WordPress and Twitter have chosen them. Though for a regular website they are also ready to offer their services and the website undoubtedly will work as fast as possible. And also will be secured and well-optimized. There are a number of their resellers that offer no annual contracts and very affordable prices for those services. Check out Jodihost, for example. Read more about the services offered at https://www.jodihost.com/services_analytics.php

Leave a Reply

Your email address will not be published. Required fields are marked *

*

code

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>