Integrating Amazon S3 Malware Scanning into Your Application Workflow with Cloud Storage Security

By Gokhul Srinivasan, Sr. Partner Solutions Architect, ISV Startups – AWS
By Aron Eidelman, Contributing Author – AWS
By Ed Casmer, CTO – Cloud Storage Security

Cloud Storage Security

Amazon Simple Storage Service (Amazon S3) is a highly scalable object storage service that allows organizations to store and process data. Because of its flexibility and ease of use, it has become the “center pin” of many applications hosted on Amazon Web Services (AWS).

A wide range of solutions ingest data, store it in Amazon S3 buckets, and then share it with downstream users. Oftentimes, the ingested data is coming from third-party sources, which opens the door to potentially malicious files—objects that may be infected with malware, viruses, ransomware, trojan horses, and more.

Not only is the organization exposed to the risk of malicious files, the application’s downstream users are also exposed to potential malware infection on their local devices.

If a customer, partner, or internal user downloads and opens an infected file, it can cause harm to the recipient’s system, as well as reputational risk, lost customers, and potential lost revenue for the organization that allowed it to be shared.

To minimize negative upstream and downstream risk, organizations should directly scan all objects in their buckets.

In this post, we will discuss how to easily fit cleanliness scanning into any workflow. Cloud Storage Security, an AWS Partner with the Security Software Competency, provides multiple scan techniques through its security solution, Antivirus for Amazon S3 which is available in AWS Marketplace. We will also review how organizations can use each model for integrating malware and virus scanning into their workflow, and how they can impact application performance and user experience when infected files are discovered.

Additionally, we’ll cover the optional implementation and use of AWS Lambda, along with stub files, which help prevent disruption to your application workflow and databases when an infected file has been identified and quarantined.

Antivirus for Amazon S3

Antivirus for Amazon S3 enables users to detect files infected with malware and viruses using a variety of scanning models. In this post, we review API, event, and retro scanning. Choosing the correct scanning model for your application workflow can help ensure no disruption in service for your end-user when an infected file is identified.

Currently, two scanning engines are available out of the box: Sophos and ClamAV. When a file is uploaded to Amazon S3, Antivirus for Amazon S3 scans the file using the scanning engine of your choice to detect malicious content. Both engines can be used together for even higher efficacy.

With event-based and retro-driven scanning, the default behavior is to quarantine infected files for further review to protect against the spread of malware. However, this policy can be modified.

API scanning allows you to programmatically scan files before they are written to an Amazon S3 bucket and determine how the files are handled by your application based on the scan results.

The solution is deployed within your AWS account through the use of an AWS CloudFormation template in a matter of minutes. The resulting infrastructure is an AWS Fargate serverless solution.

Once deployed, Amazon S3 buckets will be auto-discovered and cataloged for any connected AWS accounts. Antivirus for Amazon S3 can easily baseline existing data, as well as all new data, including files as large as 5 TB in size (the maximum individual object size allowed by Amazon S3).

Available Scanning Models

There are several ways in which objects are placed into buckets: direct upload, CLI, and more. No matter how the objects arrive, Cloud Storage Security sees three main interaction mechanisms with those objects: API-driven, event-driven, and retro-driven (looking back upon).

Antivirus for Amazon S3 delivers all three mechanisms in a simple to configure and consume manner. In the following sections, we discuss each of those scanning models, providing implementation guidance and workflow optimization examples.

API-Driven Scanning

Antivirus for Amazon S3 provides an API endpoint that can be used to submit files with code for scanning in real-time. Real-time handling of files within your application allows for the scan to occur before the file enters your perimeter.

The available APIs make it simple to leverage scanning within your code. Whether you want to scan as you stream the file in or scan an existing file, the following APIs make it easy to accomplish:

Authentication.
Scan file, return verdict.
Scan file, upload to bucket.
Scan file by S3 path.
Scan file by URL (including pre-signed).

Most Antivirus for Amazon S3 customers will implement API-driven scanning within a web form that requires an end-user to upload a file, such as a PDF or image. Once the form is submitted, the process begins by authenticating with the Cloud Storage Security API and scanning the uploaded file.

Once a file is scanned and a verdict is returned, the application workflow can respond accordingly. If the file is found to be clean, you can write the file to the destination of choice.

If the file is found to be infected, you can notify the user immediately that their file was rejected because of malware infection. The following are the steps to making an API call:

Step 1: Request Authentication Token

To request an Authentication Token, you will need to:

Specify content type of JSON in headers.
Capture username and password in JSON.
HTTP POST the data block and headers to <baseURL> + /api/Token.

headers = {'Content-type': 'application/json'}
json_foo: {"username": "<username here>", "password": "<pw here>"}
r = session.post("https://<baseURL to load balancer or friendly URL>/api/Token", data=json_foo, headers=headers)

This will return the following response:

{ 
"accessToken":"eyJraWQiOiI0Qk41QU1yVXdhWUUrZlBUZ0dhQTZWQUNXUmREMmh2dlMxWFgrUmNmTzd3PSIsImFsZyI6IlJTMjU2In0.eyJzdWIiOiIyMDYyZDQxMC1kMGE0LTRiNTItYjc2Yi03M2FiNWQ5Njk4YWQiLCJjb2duaXRvOmdyb3VwcyI6WyJVc2VycyIsIlByaW1hcnkiXSwiZW1haWxfdmVyaWZpZWQiOnRydWUsImlzcyI6Imh0dHBzOlwvXC9jb2duaXRvLWlkcC51cy1lYXN0LTEuYW1hem9uYXdzLmNvbVwvdXMtZWFzdC0xX1haNWpVNXcwWSIsImN1c3RvbTpoaWRlX3RyaWFsX21zZyI6IjAiLCJjb2duaXRvOnVzZXJuYW1lIjoiZWRjIiwiY3VzdG9tOnVzZXJfZGlzYWJsZWQiOiIwIiwiY3VzdG9tOmF3c19hY2NvdW50X2lkIjoiNzMwMDc
"tokenType":"Bearer", 
"expiresIn":3600
}

Step 2: Send the File for Scanning

To send the file for scanning, you will need to:

Specify the headers. This should include content type and the authentication token. Content types can be multipart/form-data or a binary stream.
Get the file as your language dictates.
HTTP POST the file and headers <baseURL> + /api/Scan.

headers = {"Prefer": "respond-async", "Content-Type": form.content_type, 'Authorization': 'Bearer ' + accessToken}
r = session.post("https://<baseURL to load balancer or friendly URL>/api/Scan", headers=headers, data=form, timeout=4000)

This will return the following response:

{
    "dateScanned": "2021-07-02T07:04:18.8896831Z",
    "detectedInfections": [],
    "errorMessage": null,
    "result": "Clean"
}

For more information and examples on setting up API-driven scanning, review the Cloud Storage Security Help Docs.

Event-Driven Scanning

Event-driven scanning is the easiest and fastest route to scan files in Amazon S3 buckets when your scanning requirement allows for near real-time scanning of files after they are written to S3. Amazon S3 buckets can be configured to raise an event any time an object is stored or modified within the bucket.

Antivirus for Amazon S3 leverages a direct integration with this model to listen for those events and push them to an Amazon Simple Notification Service (Amazon SNS) topic with a subscribed Amazon Simple Queue Service (Amazon SQS).

This fanout approach allows your internal workflows, as well as the scanning workflow, to properly operate without impacting one another. Event-driven scanning means that when a bucket is protected by Antivirus for Amazon S3, any object stored or modified in that bucket will automatically be scanned in near real-time.

To enable event-driven protection for a set of buckets, select your buckets in the Bucket Protection section of the Antivirus for Amazon S3 management console (Figure 1) in the following three ways:

To select one bucket at a time, select one checkbox at a time.
To select multiple buckets at once, select multiple checkboxes.
To select all buckets, choose Select Visible.

The scanning function runs in parallel to your existing application workflow and requires a limited amount of additional programming effort. Organizations have the flexibility to create and leverage any type of document flow. Two common flows are the standard flow and the two bucket system flow.

Figure 1 – Bucket protection dashboard in Antivirus for Amazon S3 management console.

Standard Flow

The standard flow is the simplest and quickest way to configure the system to scan objects. Leverage the management console to protect a selection of buckets and Antivirus for Amazon S3 scans the objects as they are written to or modified within the protected buckets.

If a file is found to be infected, it will be quarantined for further review by you or your team with the option to allow the file or destroy it. It is as easy as selecting the buckets and turning on protection.

Figure 2 – Standard document flow.

Two Bucket System Flow

The two bucket system flow allows you to create a physical separation between the ingestion of files and your production bucket(s). This allows you to separate incoming objects from your production buckets until they are scanned; this way they cannot be accessed by your end-users until deemed safe.

This approach requires you to create a staging/dirty bucket as the landing area for all uploads. To leverage the real-time scan result notifications, subscribe to the Notifications SNS topic that Cloud Storage Security publishes to. This will allow you to take real-time action (copy/move) on each file as it is scanned based on the scan results as seen below in Figure 3. For detailed steps on how to set this up, read the Cloud Storage Security Help Docs.

Figure 3 – Two bucket system document flow.

Using Stub Files with AWS Lambda (Optional)

The standard flow and two bucket system flow allow you to quarantine your files until further review is rendered by your team. While the quarantining of files is recommended, it can break your application workflow and impact user experience, because the file is diverted to a different bucket and no longer available to the application or users.

To avoid breaking your application workflow and prevent database issues, use stub files as part of your flow to create a temporary placeholder file in place of the original infected file.

This can be accomplished by using the same subscribe-to-results technique as the two bucket system. In this case, your AWS Lambda would subscribe to the infected results and respond accordingly by writing the stub file. This process ensures there are no broken links within your application as a result of quarantined files.

Retro-Driven Scanning

Retro scanning is the scanning of existing objects within an Amazon S3 bucket, providing a baseline to help ensure existing files are safe. This scan type crawls the selected S3 buckets to determine which objects to scan.

With this option, you have the flexibility to specify all existing files or a subset of files in a specific time window. Antivirus for Amazon S3 customers often require an initial scan of all data or data of a certain age (for example, <=last 6 months).

There are two ways to trigger retro scanning: on-demand or through a schedule. Both are simple and easy to manage through the management console. Retro scanning is perfect for that initial baseline, as well as to meet compliance requirements that dictate the regular (monthly/quarterly/yearly) scanning of your data. Retro scanning also allows for the rescan of data with the latest engines and signatures.

Whether you are ingesting a new bucket or need to rescan objects on a regular basis, retro scanning allows you to consistently rescan files to meet your data requirements.

Figure 4 – Scheduled scans dashboard.

Deploying Antivirus for Amazon S3

Antivirus for Amazon S3 is self-hosted and available in AWS Marketplace with a 30-day free trial to deploy and test out the application’s functionality. Pricing is determined by the number of gigabytes scanned within your environment and available on a pay-as-you-go basis.

You also have the option to purchase a custom license through AWS Marketplace private offers or Cloud Storage Security directly. This section provides the steps needed to get Antivirus for Amazon S3 up and running.

Step 1: Subscribe and Deploy Antivirus for Amazon S3 through AWS Marketplace

To subscribe, go to the Cloud Storage Security Antivirus for Amazon S3 listing on AWS Marketplace. After selecting the configuration, you can start deploying Antivirus for Amazon S3.

Figure 5 – AWS Marketplace listing.

Step 2: Deploy Antivirus for Amazon S3 Using an AWS CloudFormation Template

Deployment of the app takes minutes and is accomplished by using a CloudFormation template that installs all necessary infrastructure and software components, as well as all required permissions and roles. Review steps to set up the CloudFormation template in the How to Deploy section of the Cloud Storage Security Help Docs.

The CloudFormation template creates the following resources:

AWS Fargate with Amazon Elastic Container Service (Amazon ECS) cluster with one service and task.
Amazon DynamoDB and AWS AppConfig.
AWS Identity and Access Management (IAM) roles and policies.
Amazon Cognito User Pools.
SNS topic and CloudWatch log groups.
Elastic Load Balancing (ELB) – optional.

There are five pieces of information you will need to provide while executing the CloudFormation template. First, select the Amazon Virtual Private Cloud (Amazon VPC) and two subnets (in different Availability Zones for high availability) for the management console to run in, along with specifying a security group CIDR block for allowed network access. Then, set a valid email address for your account login.

There are many other configuration aspects to specify, but they are optional and dependent on your deployment needs. For more information, review the Deployment Details section the Cloud Storage Security Help Docs.

Step 3: Launch Antivirus for Amazon S3 and Enable Bucket Protection

Once the solution has been deployed, buckets can be protected in under five minutes by simply activating bucket protection on any available Amazon S3 buckets.

You will receive an email invite with login credentials to access your console.

Figure 6 – Sample email with login credentials.

Once you’ve accessed your console, review all discovered buckets, connect additional AWS accounts (if any), and enable any/all of the scan models introduced in this post.

Figure 7 – Bucket protection dashboard.

Once buckets are discovered, you can choose to turn on protection for them. Enabling bucket protection allows the application to scan all incoming files in real-time.

Alternatively, you have the option to enable scanning on a scheduled basis, which will scan new and/or existing files within a bucket on a schedule of your choosing.

Figure 8 – Summary dashboard.

Scanning Engines Available

Antivirus for Amazon S3 currently comes with two scanning engines out of the box. The Sophos engine provides fast and powerful scanning, including much larger file size support. It is also updated more frequently.

Antivirus for Amazon S3 also supports the widely known open source ClamAV engine. Additionally, you have the option to enable scanning using multiple engines at once.

Figure 9 – Available scan engines.

Summary

In this post, we described the importance of setting up an anti-malware solution to protect your organization, employees, customers, and partners from the negative upstream and downstream consequences of infected files within your Amazon S3 buckets.

With Antivirus for Amazon S3, you can quickly and easily deploy a multi-engine anti-malware scanning solution to manage file protection and malware findings.

Because Amazon S3 is tightly integrated into application workflows, Antivirus for Amazon S3 offers multiple scanning models. Scan files in near real-time once they are written to your Amazon S3 buckets, scan pre-existing files to baseline your data, and scan files before they’re even uploaded to your Amazon S3 (or any other storage).

For deployment support, review the Cloud Storage Security Help Docs for Antivirus for Amazon S3, or reach out to Cloud Storage Security at support@cloudstoragesec.com for help deciding which scanning model is a fit for your organization.

.
.

Cloud Storage Security – AWS Partner Spotlight

Cloud Storage Security, an AWS Partner with Security Software Competency, is an automated security solution that easily discovers and scans objects and files in Amazon S3 buckets for malware and threats.

Contact Cloud Storage Security | Partner Overview | AWS Marketplace

AWS Partner Network (APN) Blog