Networking & Content Delivery
Accelerating your Drupal Content with Amazon CloudFront
Introduction
Between 2001 and today, the Drupal code for content collaboration and communication has powered over a million websites (1,378,067 as of March 2018). Of the Quantcast Top 10k list (top US 10,000 sites by traffic), those using the Drupal Content Management System at their core make up 9 percent. This makes it the second largest web CMS platform behind WordPress. AWS has already published an excellent blog post on how to Accelerate your WordPress content with Amazon CloudFront, written by my colleague Ronan Guilfoyle.
So let’s get to business, how we can do the same for Drupal? In this blog we will consider what settings we need to make in Amazon CloudFront to accelerate and offload our Drupal content to a globally distributed set of CloudFront Edge Locations.
What is Amazon CloudFront?
Amazon CloudFront is a web service that speeds up distribution of your static and dynamic web content, such as .html, .css, .js, and image files, to your users. The Amazon CloudFront content delivery network is built on top of the expanding AWS infrastructure and delivers your content through a worldwide network of data centers called Edge Locations. When a user requests content that you’re serving with CloudFront, the user is routed to the Edge Location that provides the lowest latency (time delay), so that content is delivered with the best possible performance. If the content is already in the Edge Location’s cache, then CloudFront will deliver it immediately. If the content is not in that Edge Location’s cache, CloudFront will retrieve it from a CloudFront Regional Edge Cache location or your origin.
Lambda@Edge extends serverless compute functionality to the CloudFront network and enables developers to implement a wide variety of use cases for Web CMS sites like Drupal. Use cases such as personalization and URL re-writing can be easily deployed allowing your code to respond to your end users at the lowest latency. Your code can be triggered by Amazon CloudFront events, such as requests for content to, or from, origin servers and viewers. Upload your Node.js code to AWS Lambda and AWS Lambda takes care of everything required to replicate, route, and scale your code with high availability at an AWS location closer to your end user. You only pay for the compute time you consume – there is no charge when your code is not running.
Getting Started with Drupal on AWS and Amazon CloudFront
Each CloudFront distribution contains one or more origin locations. An origin is where your Drupal content resides, be that static HTML in an Amazon S3 bucket, Drupal running on Amazon EC2 in your chosen Region, AWS Elastic Beanstalk, Amazon Lightsail instances, or from your own local data centers. You must create at least one origin. In my case, the origin is my Amazon Application Load Balancer sitting in front of my EC2 instances.
To get my Drupal environment built, I followed our Reference Architecture for Drupal, found in the AWS Architecture section of our website. Running the supplied CloudFormation stacks I get a deployment of Drupal 8, making use of Amazon EC2, Amazon EFS, and Amazon RDS using Amazon Aurora. This is all wrapped up in a highly available design using multiple Availability Zones, and configured to auto scale using Amazon EC2 Auto Scaling groups.
I’m using version 8 of Drupal, which is the current production version. It is a stable release that includes a number of core modules that can help us with our goal of accelerating Drupal content with Amazon CloudFront. Let’s get started.
- The Path Module allows us to create URL aliases for our content. By default, Drupal creates URL patterns like https://www.mydrupalwebsite.com/node/2. This isn’t great from a human readability perspective, and it also isn’t optimised for search engine optimisation. When we consider adding a Content Distribution Network, URL aliases also allow us to define the paths within our website, optimising for the type of content they hold. The Path module allows us to take the URL we just discussed, and alias it into something like https://www.mydrupalwebsite.com/images
- Within Drupal 8 Administration (Administration >> Configuration >> Development) you will find that “Aggregate CSS files” and “Aggregate JavaScript files” are enabled by default. This helps reduce the bandwidth requirements between our Origin AWS infrastructure, and the CloudFront Edge nodes.
- On the same page, you can see that Internal Drupal caching is disabled by default. This controls the maximum time a page can be cached by browsers and proxies, controlled through the value for max-age in Cache-Control headers. Even if we defined a value for this, we can override that setting through the controls offered in Amazon CloudFront, providing us a single interface to control these TTL values. I recommend that you set this to a default value. I set mine to 15 minutes.
There is one module, not included in the Core Package that I would consider. This is the CDN Module, produced by Wim Leers. It has been actively developed since its initial release in 2008 and has been re-written for Drupal 8. This module changes file URLs so that CSS, JavaScript, images, audio, and videos can be cached within CloudFront more easily. It also changes the URL when a file is changed by a user in Drupal. This allows the content to be cached early, without having to think about expiring content from the cache. To enable the module, go your Drupal Administration site, and click the Extend Tab. From there, click the blue button for “+ install new module.” I looked up the latest version supporting Drupal 8, and provided the URL to the .tar.gz file. In my case, this was https://ftp.drupal.org/files/projects/cdn-8.x-3.2.tar.gz.
With the module installed, I scrolled down to the bottom of the Extend Tab and enabled both the CDN and CDN UI modules. From there, I followed the configuration steps outlined from the modules documentation.
Now let’s head over to the AWS Management Console, and use the CloudFront console to create our CloudFront distribution. We will select Web as the Distribution type. From there, you can see the following sections, which need to be filled in. Following each screenshot, I have called out any fields that I modified. Those fields that I have not called out have been left at their default settings.
Section One :: Origin Settings
Origin Domain Name – CloudFront is integrated into many AWS services, so it knows about the AWS infrastructure deployed in my AWS account. I simply scroll down through the list until I find the Application Load Balancer sitting in front of my Drupal EC2 instances.
Origin SSL Protocols – I have chosen to remove support for TLS 1. This is entirely your choice and based on the security or risk position you want to take for your own application. On June 30, 2018, PCI regulations will change, removing TLS 1 from their list of supported cryptographic algorithms. I’ve decided to get ahead of that change now.
Origin Protocol Policy – I want to ensure all communications from CloudFront to my Origin occur over HTTPS. I have a certificate installed on my Application Load Balancer (supplied by Amazon Certificate Manager). As a result, I have enabled the HTTPS Only setting.
Origin Custom Headers – In some application deployments, you will want to ensure all visitors reach your website using CloudFront. You may want to stop people going around it, communicating directly with your Origin endpoint (in this case my Application Load Balancer). Using Origin Custom Headers, I can create a “Header Name” and “Value” and add them to my CloudFront distribution. You can use any values you want, just make sure you take note of them.
With that done, I can then create an AWS WAF ACL to my Application Load Balancer, using a String Matching Condition, which looks for an exact match on the Header Name and Value I have just created.
Section Two: Default Cache Behavior Settings
Viewer Protocol Policy – I want to ensure that all communications between CloudFront and the viewers of my website are encrypted using HTTPS. So I changed this setting to Redirect HTTP to HTTPS, this means visitors to my website are automatically redirected if they don’t specify a secure connection.
Allowed HTTP Methods – As with WordPress, my Drupal website supports the use of Forms, so I want CloudFront to support the additional POST HTTP methods for form submissions.
Whitelist Headers – I want to control how CloudFront caches content. Since Drupal supports multi-site configurations, I have chosen to cache the Host header, along with the Origin header and the forwarded protocol headers. As with WordPress, this should be considered a minimum, and you can adjust this based on the profile of your own Drupal website.
Note: if you choose “None” then CloudFront could serve the wrong content in certain circumstances, such as when you host content for multiple Drupal sites on the same Drupal installation. On the other hand, if you choose “All,” CloudFront will not cache objects but will send all requests to your Origin for processing, reducing cache hit ratio and adding additional load to your Origin.
Object Caching – Earlier in this blog I referred to native caching capabilities in Drupal, which include the ability to declare the minimum TTL for objects served from my Drupal origin. Here you can opt to either leverage the values coming from Drupal, in which case select Use Origin Cache Headers, or if you want to override these values, or control them within CloudFront, then select Customize and apply a Default TTL which you want to be the default caching period your clients are given.
Forward Cookies –I recommend that you enable the Whitelist feature and add in the cookies that you would like to be forwarded to your Origin. Drupal supports a variety of cookie session controls, natively as well as through the Community Modules. This is particular to your configuration so play with this setting.
For me, using a default Drupal configuration, I understand that Drupal uses the PHP Session ID, renaming it to a cookie starting with “SESS” that is concatenated with a hash of the session name. So, with that knowledge I can set my CloudFront distribution to Whitelist cookies starting with SESS, as follows:
Query String Forwarding and Caching – Drupal makes use of query strings within the URL. Therefore, I have set this parameter to forward all to my Origin. If you can refine this down to a whitelist, based on your Drupal application configuration, then you will be able to improve your cache-hit ratio. Try and ensure static content is presented through static paths, making use of the Path Module where appropriate.
Compress Objects Automatically – If objects received by CloudFront from the Origin are not compressed (when they can be), then I want CloudFront to do this for me. Setting this flag to True means that where possible, CloudFront will compress these objects.
Section Three :: Distribution Settings
AWS WAF Web ACL – I recommend that even if you don’t plan to use AWS WAF in the first place (but I recommend that you do!) that at the very least you enable a blank WAF ACL for each of your CloudFront distributions. The reason for this is that it takes far less time to add a rule to an empty web ACL than it takes to modify your CloudFront distribution globally to include a Web ACL in its distribution settings. If you are ever under an attack, then time to respond will be important to you. This design approach allows you to make effective changes as fast as possible.
Alternative Domain Names – Here I have added the real Fully Qualified Domain Name(s) for my website, allowing this CloudFront distribution to respond to calls for my domain.
SSL Certificate – CloudFront offers two options for hosting TLS/SSL certificates. The default option is to serve only clients that support Server Name Indication (SNI). With this default option CloudFront associates your certificate with an IP address that is not dedicated to your distribution. SNI is an extension to the TLS protocol, supported by most modern browsers, that includes the domain name in the request header. This allows CloudFront to determine which certificate to use when negotiating an encrypted channel. If a client browser does not support SNI the connection can’t be secured and will be dropped.
The alternative is to have CloudFront provide dedicated IP addresses in each Edge Location that hosts your distribution. In this case, a browser that doesn’t support SNI will be able to negotiate a secure connection to your website. Note: This option incurs an additional monthly charge. Wikipedia lists browsers that support SNI can be found on this page.
I recommend the default option of using Server Name Indication (as the following screenshot shows):
Logging – I have chosen to configure CloudFront to log all viewer requests for files in my distribution, including logging of the cookies within that session. This is optional and should be enabled if you need to collect this data. Data is stored in Amazon S3, and as you can see I have also configured those logs to be stored inside a Prefix (similar to a folder) for the logs representing this distribution.
Conclusion
I hope you have found this blog post helpful. Using the steps outlined here you are able to deploy CloudFront to cache and accelerate your Drupal content using a globally distributed set of CloudFront nodes. To learn more about the full range of CloudFront features, use cases, and announcements, I encourage you to visit our Amazon CloudFront getting started webpage.
Blog: Using AWS Client VPN to securely access AWS and on-premises resources | ||
Learn about AWS VPN services | ||
Watch re:Invent 2019: Connectivity to AWS and hybrid AWS network architectures |