AWS Storage Blog
Secure data in a multi-tenant environment by automatically enforcing prefix-level encryption keys in Amazon S3
Many organizations need to store and process data that belong to multiple entities, commonly referred to as multi-tenancy. In these situations, it is important to secure every tenant’s data and ensure that a consumer can only access the data that they require for their responsibilities and nothing more. In particular, if a user or application needs to access the data of User A, then providing that access should not unintentionally lead to the user or application having access to other users’ data. Although restricting direct access to the data is one way to adhere to the principle of least privilege, another option that provides you with two layers of protection is to encrypt the data and make sure that access to the keys used to encrypt the data is also strictly controlled. When using encryption keys, it is important to use different keys for data that belongs to different entities to keep access to the data isolated.
A typical pattern when storing data belonging to multiple entities is to use one Amazon Simple Storage Service (Amazon S3) bucket per entity so as to keep the data isolated. If the number of entities is large, then you could store the data in the same bucket but with different prefixes. S3 provides many options to keep the data encrypted. By default, S3 applies server-side encryption with S3 managed keys (SSE-S3) as the base level of encryption for every bucket. When working with Amazon S3 General purpose buckets, you also can set a default bucket encryption configuration on your bucket to automatically encrypt new objects with AWS Key Management Service (AWS KMS) keys (SSE-KMS). You can also use different customer managed AWS KMS keys for different S3 objects by passing in the AWS KMS key to use in your S3 PUT requests.
In this post, I walk through a solution for general purpose S3 buckets using AWS Lambda and S3 Event Notifications to monitor every object uploaded to an S3 bucket and make sure they are encrypted with the correct customer managed AWS KMS key depending on its prefix. This allows customers, who are already using a single S3 bucket to store data from multiple entities, to add entity specific AWS KMS keys to protect their data without having to depend on the application or users that write objects to the bucket to use the right AWS KMS Keys. This enables you to meet isolation requirements in multi-tenant S3 buckets.
Solution overview
This solution monitors S3 for new object uploads and, for every upload, checks if the object is encrypted with the correct customer managed AWS KMS key. If not, it initiates a copy of the S3 object with the right AWS KMS key.
While it is possible to use AWS managed KMS keys in SSE-KMS for protecting S3 buckets, if the bucket has data belonging to different entities that need to be isolated, then objects under different prefixes have to be encrypted with different customer managed AWS KMS keys, as there will be only one AWS managed KMS key for S3 in an account in a Region. So, in the rest of this post, any reference to an AWS KMS key will mean a customer managed AWS KMS key. To learn more about encryption options in S3, refer to the data encryption section in the Amazon S3 documentation. To understand the different levers available in managing access to Amazon S3 buckets, refer to the Access Management section of Amazon S3 documentation.
Moreover, as this solution is applicable only to general purpose S3 buckets, any reference to an S3 bucket in the rest of this post will mean a general purpose S3 bucket.
The solution has the following flow:
1. An object is uploaded to the S3 bucket.
2. An S3 Event Notification is set up on the buckets to monitor for objects being uploaded. Once an object is uploaded, that information is sent to an Amazon Simple Queue Service (SQS) queue.
3. The Lambda service pulls messages (each message representing an S3 object) from the SQS queue and triggers the lambda function for the messages received.
4. The Lambda function reads data from an Amazon DynamoDB mapping table. This table contains the mappings between the S3 prefix and the AWS KMS key that should be used to encrypt objects with that prefix.
(4.1) If the AWS KMS key of the object does not match what is in the mapping table, then the Lambda function invokes the S3 copy command to encrypt the object with the right key.
(4.2) If the key is correct, nothing more is done.
5. If the bucket uses S3 Versioning, then the Lambda function deletes the previous version of the object from S3.
6. The Lambda function logs the encryption activity, any object version deletion activity, and any Lambda execution errors into a DynamoDB log table.
7. If there are errors in executing the Lambda function, they are natively sent to SQS’s dead-letter queue.
(7.1) An Amazon Cloudwatch alarm monitors the dead-letter queue and sends out a notification when there are messages in the queue as an indication that there are errors that need to be looked into.
The following diagram shows a flow of how the solution works as just described:
Figure 1: Securely store data in multi-tenant environment by enforcing prefix-level AWS KMS keys in Amazon S3
Prerequisites
To test this solution, you will need an AWS account. As the solution uses AWS Cloud Development Kit (AWS CDK) for deploying the necessary resources as AWS CloudFormation stacks, you will need to have AWS CDK installed on your machine. To install AWS CDK, follow the steps in the documentation.
Solution deployment
The solution is available for you to demo at this AWS Sample GitHub page. The installation section in the README file has detailed information on how to deploy the solution. The solution uses the AWS CDK, which in turn deploys the solution as CloudFormation stacks.
The solution has three stacks:
- Core stack: This stack contains the components that are necessary to enforce prefix-level keys. There is only one instance of this stack in a deployment. The following resources are deployed as part of this stack:
- Lambda functions
- DynamoDB tables
- SQS queue
- Amazon Simple Notification Service (SNS) topic
- Amazon CloudWatch alarm
- Integration stack: This nested stack is found within the “core stack”. An integration stack contains the necessary configuration and permissions that are needed for the core stack to enforce the prefix-level keys on a single S3 bucket. For example, to enforce the AWS KMS keys, the core stack’s DynamoDB table needs to have the mapping for all the prefixes and the corresponding AWS KMS key ARNs. This integration stack also verifies the necessary permissions to read and write to the S3 bucket and to encrypt and decrypt using the AWS KMS keys that are in place. There is one instance of this stack for every S3 bucket covered by this solution. For every bucket covered by this solution, the following permissions are needed for the solution to be able to enforce the correct encryption.
- The following permissions are needed on the S3 bucket and its objects:
- s3:PutObject
- s3:GetObject
- s3:GetObjectVersion
- s3:ListBucket
- s3:DeleteObject
- s3:DeleteObjectVersion
- The following permissions are needed on the AWS KMS keys:
- kms:Encrypt
- kms:Decrypt
- kms:GenerateDataKey
- The following permissions are needed on the S3 bucket and its objects:
- Demo stack: You do not need this when you implement this solution for your own buckets, but this is available for demo and testing purposes to show how the core stack and integration stack work. This demo stack creates an S3 bucket and three AWS KMS keys for the prefixes “prefix1”, “prefix2”, and “prefix3”. When you want to use the core stack to enforce prefix level AWS KMS keys on one of your own buckets, only the core stack (and the nested integration stack) is needed, as the buckets and AWS KMS keys are already available from one of your existing stacks.
Solution demo
Once you deploy the demo stack, you have a bucket and three AWS KMS keys ready for testing. If you navigate to the bucket’s Properties tab on the AWS Management Console, then you see that the default encryption for the objects in the bucket is server-side encryption with Amazon S3 managed keys (SSE-S3), as shown in the following image.
Figure 2: Server-side encryption with Amazon S3 managed keys is enabled by default
Try uploading a file to different prefixes within the bucket, as shown in the following image:
Figure 3: Amazon S3 bucket with four different prefixes
Once you upload a file, you can navigate to the object’s Properties tab and see that it has SSE-S3 encryption, as that is enabled by default on the bucket.
Figure 4: By default, the object is encrypted with Server-side encryption with Amazon S3 managed keys (SSE-S3)
However, the upload triggers a Lambda function, which in turn copies the object again but with the correct AWS KMS key. Therefore, within a short time, you can see the object’s encryption change to the AWS KMS key, as shown in the following image.
Figure 5: After a short time, the object is encrypted with Server-side encryption with AWS KMS keys (SSE-KMS)
Additional considerations
This solution can help you with data isolation when storing multi-tenant data in S3 buckets. However, there are situations where this might not meet your needs or is not the best option for you. Read the Caveats section of the README file before you deploy this solution. The following are some key considerations:
- If the number of tenants is small, then it is simpler to have separate buckets for each tenant.
- In this solution, the file is encrypted with your preferred key for the specific prefix after the object has been uploaded to the bucket. So there is a small duration during which the object could remain encrypted with the default KMS key defined on the S3 bucket, or the Amazon S3 managed key (SSE-S3) which Amazon S3 uses as the base level of encryption if no default KMS key is configured. If your requirement is that the object should never be in the bucket without being encrypted by the tenant specific AWS KMS key, then this might not meet that need.
- For buckets not using S3 Versioning, the solution can handle S3 objects of up to 100 GB (this is not a fixed limit, as the constraining factor is the 15 minute execution duration limit of Lambda functions). However, for buckets using S3 Versioning the maximum object size is 5 GB. If your file sizes are larger than this, then this solution might not be the right one for you.
- S3 access control lists (ACLs) are lost when an object is copied to “correct” the AWS KMS key. Although most modern use cases of Amazon S3 do not need them anymore, if you are using ACLs, then this solution does not carry them over. Also, copying an object resets any system-controlled metadata like creation date and last modified date, while retaining user-defined or user-controlled metadata like storage class as described in the CopyObject documentation.
- For buckets using S3 Versioning, there are some race conditions in concurrent/high-frequency writes of objects with the same name to the S3 bucket. This could lead to the latest version of the object being overwritten by an older version. This is due to the event-driven architecture of this solution, wherein the Lambda function could be copying an older version of an S3 object to add the appropriate encryption key object even as the users of the S3 bucket are uploading a newer version. These situations are discussed as the last two points in the Caveats section of the README file. Be sure to assess the applicability of the solution to your use case.
Cleaning up
Once you have completed the demo, remember to clean up the resources created to avoid incurring future costs. You can use the cdk destroy
command to delete the resources, as described at the end of the Testing section in the README file. You can also choose to delete the resources by deleting the Demo Stack and the Core Stack the AWS CloudFormation Console page.
Conclusion
In this post, I discussed a solution that uses S3 Event Notifications and a Lambda function to securely isolate data belonging to multiple tenants in a single Amazon S3 bucket by using different customer AWS KMS keys based on the S3 prefix. The solution uses AWS CDK and can work with multiple S3 buckets, including ones that use S3 Versioning. The solution makes it easy to modify the configuration to protect new S3 buckets or prefixes. So, you can start with a single S3 bucket and gradually expand the solution’s coverage.
Data security is extremely important for organizations. This is particularly true when data from multiple tenants has to be stored in the same S3 bucket. It is important that this security is maintained using both encryption and access policies to adhere to the principle of least privileged access. While this helps meet security requirements, it also improves the confidence of the data owners in how the data is being handled. Even in cases where access policies might not be feasible because introduction of new policies might disrupt users or applications who are already using the bucket, this solution makes it possible to use tenant-specific AWS KMS keys without affecting any existing usage.
Thank you for reading this post. Try this solution and let us know what you think on the GitHub page.