AWS Public Sector Blog

Move file data in and out of AWS GovCloud (US) with AWS DataSync

AWS branded background design with text overlay that says "Move file data in and out of AWS GovCloud (US) with AWS DataSync"

Amazon Web Services (AWS) customers who need to comply with the most stringent US government security and compliance requirements operate their workloads in AWS GovCloud (US), which is architected as a separate partition providing network and identity isolation. As public sector customers find increasing need to move data between the AWS GovCloud (US) and the standard partition, they need tools to help them lower their operational burden.

In part one of this two-part blog series, I shared how the Amazon Simple Storage Service (Amazon S3) Data Transfer Hub Solution from AWS Labs helps customers move data in Amazon S3 between the AWS GovCloud (US) partition and the standard partition. In this blog post, I walk through how to use AWS DataSync to move data on network file system (NFS) shares between the two partitions and reduce operational burden.

Solution overview

AWS DataSync is an online data transfer service that simplifies, automates, and accelerates moving data between on-premises storage systems and AWS storage services, in addition to between AWS storage services. You can use AWS DataSync to migrate active datasets to AWS, archive data to open up on-premises storage capacity, replicate data to AWS for business continuity, or transfer data to the cloud for analysis and processing. AWS DataSync is an agent-based deployment for transferring files between self-managed storage and AWS services whereby customers deploy into their environments as virtual machines or an Amazon Machine Image (AMIs). AWS DataSync is agentless when transferring files between AWS storage services.

By default, DataSync does not transfer data across the standard and AWS GovCloud (US) partitions. This post walks through the steps to deploy an AWS DataSync agent into the standard partition that is able to authenticate with DataSync in the AWS GovCloud (US) partition, which transfers files from an NFS in the standard partition to Amazon S3 in the AWS GovCloud (US) partition.

Figure 1. DataSync agent setup between AWS Standard and AWS GovCloud (US) partitions.

Figure 1. DataSync agent setup between AWS Standard and AWS GovCloud (US) partitions.

Prerequisites

This walkthrough deploys an AWS DataSync Agent into an AWS standard account, and activates it to the AWS DataSync service in the AWS GovCloud (US) account. Using this setup, you transfer files from the standard partition to the AWS GovCloud (US) partition.

To complete this walkthrough, you need:

  • An AWS standard account
  • An AWS GovCloud (US) account
  • A destination Amazon S3 bucket in AWS GovCloud (US)

Note that for this walkthrough I use the following regions:

  • Standard account: us-west-2
  • AWS GovCloud (US) account: us-gov-west-1

Procedure

Step 1: Prepare AWS DataSync Agent for use in the standard partition

AWS DataSync provides a regional-specific image of the Agent for deployment onto Amazon Elastic Compute Cloud (Amazon EC2). However, this default image does not activate cross-partition. In this step, you prepare a AWS DataSync agent in the AWS GovCloud (US) partition that is exported for use in the standard partition.

For this step, I utilize the AWS Command Line Interface (CLI) using a terminal window. In the AWS GovCloud (US) account, run the following commands:

# Obtain the AMI Id of the DataSync Agent AMI via SSM Parameter Store
# See https://docs.thinkwithwp.com/datasync/latest/userguide/deploy-agents.html
DATASYNC_AMI=$(aws ssm get-parameter --name /aws/service/datasync/ami --query 'Parameter.Value' --output text)

# Create an instance of the DataSync Agent
aws ec2 run-instances --image-id=$DATASYNC_AMI --instance-type=m5.large --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=datasync-instance}]"

# Obtain the Instance Id
DATASYNC_INSTANCE=$(aws ec2 describe-instances --filters "Name=tag:Name,Values=datasync-instance" --query "Reservations[0].Instances[0].InstanceId" --output text)

# Create an AMI using the create-image command - Will take a few minutes
aws ec2 create-image --instance-id $DATASYNC_INSTANCE --name "DataSync Agent - GovCloud" --description "DataSync Agent - GovCloud" --tag-specifications "ResourceType=image,Tags=[{Key=Name,Value=datasync-image}]"

# Obtain the newly created AMI
DATASYNC_EXPORT_AMI=$(aws ec2 describe-images --filters "Name=name,Values='DataSync Agent - GovCloud'" --owners self --query "Images[0].ImageId" --output text)

Step 2: Export DataSync agent from AWS GovCloud (US) to standard partition

To deploy the agent into the standard partition, you must export the image. To do this, use the Amazon EC2 create-store-image-task API which exports the image into an Amazon S3 bucket. Then use the Amazon S3 Data Transfer Hub as described in part one of the blog series to move the image from the AWS GovCloud (US) to the standard partition.

The next set of steps assume you have completed that walkthrough in part one and have the solution deployed. If you have not, please read Move data in and out of AWS GovCloud (US) using Amazon S3 and complete, or utilize another mechanism, such as manual Amazon S3 copy commands, to move the image object from the AWS GovCloud (US) bucket to the standard bucket.

The image creation command from step one should complete in approximately five minutes. Once complete, run the following commands in the AWS GovCloud (US) account:

# Set a convenience variable for your AWS GovCloud (US) S3 bucket. 
S3_EXPORT_BUCKET=<your bucket name>

# Export AMI to S3 bucket
aws ec2 create-store-image-task --image-id $DATASYNC_EXPORT_AMI --bucket $S3_EXPORT_BUCKET

# Validate completion of export task, takes appromately 2-3 minutes
aws ec2 describe-store-image-tasks

# Validate object in S3 bucket
aws s3 ls $S3_EXPORT_BUCKET | grep ami

Take note of the object key from the S3 validation command. You need it for the next step; it is in the format of “ami-xxxxxxxxxxxxxxx.bin

The S3 Data Transfer Hub solution automatically transfers this file to the standard partition if you followed the steps from part one of the blog series. By default, the transfer hub transfers every hour. You can shorten that time by adjusting the Amazon EventBridge rule to fire on a shorter interval. I have changed mine to two minutes. Complete the transfer prior to moving to the next step.

Step 3: Import DataSync agent into standard partition

Once the exported image is in the standard Amazon S3 bucket, run the following commands in the standard account. Use the create-restore-image-task to import the AMI:

# Define AMI to import
DATASYNC_IMPORT_AMI=<your object key from step 2>

# Define AMI bucket source
IMPORT_AMI_BUCKET=<your bucket name>

# Import Image into Standard partition
aws ec2 create-restore-image-task --object-key $DATASYNC_IMPORT_AMI --bucket $IMPORT_AMI_BUCKET --name "DataSync Agent - GovCloud" --tag-specifications "ResourceType=image,Tags=[{Key=Name,Value=datasync-image}]"

# Define an SSM parameterwith the new AMI ID for use later in the walkthrough
DATASYNC_IMPORTED_AMI=$(aws ec2 describe-images --filters "Name=name,Values='DataSync Agent - GovCloud'" --owners self --query "Images[0].ImageId" --output text)
aws ssm put-parameter --name datasync-imported-ami --value $DATASYNC_IMPORTED_AMI --type String --description "AMI ID for the imported DataSync AMI"

Step 4: Deploy the demo infrastructure into the standard account

In this step, I deploy all the infrastructure needed in the standard account using AWS CloudFormation. The stack spins up a new Amazon Virtual Private Cloud (Amazon VPC), an Amazon Elastic File System (Amazon EFS) for NFS storage, an Amazon EC2 instance to help with mounting the Amazon EFS, and the DataSync Agent. In the standard account, navigate to AWS CloudFormation in the AWS Management Console and select Create stack With new Resources. Select Template is ready and supply the Amazon S3 URL below; then select Next.

https://solution-references.s3.amazonaws.com/datasync/datasync_blog_cft_template.yaml

Or alternatively click this Launch Stack link.

Enter the stack name as DatasyncBlogDemoStack. Leave the other defaults and select Next, then Next again. Select the boxes inside Capabilities and select Create stack

Step 5: Create sample data in the NFS

Next, create some sample data in the NFS to transfer to AWS GovCloud (US). Do this using the Mount helper Amazon EC2 instance provisioned in Step 4, and the AWS Systems Manager (SSM) Session Manager.

First, in the standard account, navigate to CloudFormation and grab the filesystem URL from the outputs tab of the DatasyncBlogDemoStack. The key is “efsNameoutput” and the value will be formatted like “fs-xxxxxx.efs.region.amazonaws.com.” You need this to mount the Amazon EFS to our Mount helper EC2 instance.

Still in the standard account, navigate to the Amazon EC2 console and find the running instance named “mount-instance.” Select the row and select Connect, choose the Session Manager tab and select Connect again. This opens up a new tab with a console shell.

Please note: For this demonstration, you are only be able to access the Mount Instance using SSM Session Manager, as you did not provide any SSH keys or open up the SSH ports on the instance.

In the Session Manager tab, enter the following commands, making sure to enter your information as needed:

# Switch to bash shell for cleaner UI
/bin/bash 

# Create Mount Point Location
cd ~
mkdir efs-mount-point

# Mount EFS as NFS
EFS_URL=<your filesystem URL>
sudo mount -t nfs -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport $EFS_URL:/ ~/efs-mount-point

# Give ourselves permissions to write to EFS
# See https://docs.thinkwithwp.com/efs/latest/ug/accessing-fs-nfs-permissions.html
cd efs-mount-point
sudo chmod go+rw .

# Generate some test data
echo "NFS Test File 1" >> nfs-test-file-1.txt
echo "NFS Test File 2" >> nfs-test-file-2.txt
echo "NFS Test File 3" >> nfs-test-file-3.txt
echo "NFS Test File 4" >> nfs-test-file-4.txt

Step 6: Setup DataSync agent in AWS GovCloud (US)

Now we are ready to setup the DataSync Agent in AWS GovCloud (US). For this step you will need the public DNS of the DataSync Agent in the standard account. Find that from the Amazon EC2 console from the running instance named “datasync-agent-instance.”

In the AWS GovCloud (US) account, navigate to AWS DataSync in the AWS Management Console and select Getting Started. Now set up our agent:

  • Select Amazon EC2 as the hypervisor.
  • Select service endpoint type of Public service endpoints in AWS GovCloud (US-Gov-West).
  • Leave Activation key as Automatically get the activation key from your agent.
  • Enter the public DNS of the Agent in your Standard Account and then select Get key.

Figure 2. Activating the DataSync agent.

Figure 2. Activating the DataSync agent.

Upon success, you see a green box in the Activation Key section. Go ahead and name your agent GovCloud-Standard-Sync-Agent and select Create agent.

Figure 3. Activating the DataSync agent.

Step 7: Create DataSync locations in AWS GovCloud (US)

DataSync utilizes discrete locations in its transfer tasks so there is fine, granular control over the transfer. We create two locations—one for the Amazon S3 destination in the AWS GovCloud (US) account, and one for the NFS source in the Standard account. Continuing in the GovCloud account DataSync console, select locations on the left navigation bar, and then select “Create Location”.

First, create the Amazon S3 destination in AWS GovCloud (US):

  • Select Amazon S3 as the Location type.
  • Select your S3 bucket:
    • Leave the S3 storage class as Standard.
    • Under folder enter “datasync” to ensure that we segregate our demonstration files accordingly.
    • Select Autogenerate for the IAM role. Doing this will create a least privileged role for this specific bucket that DataSync will assume to perform the transfer operations.
  • Select Create location.

Figure 4. Creating a DataSync location.

Second, create the NFS location by returning to the locations panel and selecting Create location.

  • Select Network File System (NFS) as the Location type.
  • Select GovCloud-Standard-Sync-Agent as the agent.
  • Enter the EFS URL as the NFS server (fs-xxxxxxx.efs.region.amazonaws.com).
  • Enter ‘/’ as the mount path.
  • Select Create location.

Step 8: Create the transfer task

Now create the Transfer Task which tells AWS DataSync what to transfer. Continuing in the AWS GovCloud (US) account start by navigating to tasks on the left side and then selecting Create Task.

  • Configure Source Location — Select the NFS location corresponding to your filesystem, then select Next.
  • Config Destination Location — Select the Amazon S3 location corresponding to your Amazon S3 Bucket, then select Next.
  • Set the task name as GovCloud-Standard-Demo-DataSync-Task.
  • Leave the remaining options as default, and under Task logging make sure you select autogenerate the CloudWatch log group.
  • Select Next.
  • Review the details and select Create task.

Figure 5. Creating a DataSync task.


Figure 6. Creating a DataSync task.

Step 9: Execute the transfer task and validate

Finally, kick off the transfer and validate the results. Continuing in the AWS GovCloud (US) account, from the task definition screen, which came up as a result of creating the task. Select Start, then Start with defaults.

Figure 7. Starting a DataSync task.

The task begins execution and the transfer takes a few minutes to complete.

See Figure 8 for a successful execution. You can see the details by viewing the History tab, then selecting the Execution ID.

Figure 8. Successful DataSync task execution.

Our files are also present, as expected, in Amazon S3.

Figure 9. Successful DataSync task execution. Objects are listed in Amazon S3.

Troubleshooting

If something has gone awry, please go back and review the steps. The AWS DataSync CloudWatch logs can point you in the right direction. Typical issues that arise are around incorrect permission sets for the Amazon S3 location, or around network connectivity for the agent, either in setting up, or execution.

Cleanup

To clean up this solution, delete the following resources:

  • AWS GovCloud (US) account
    • AWS DataSync
      • Task
      • Locations
      • Agent
    • Amazon S3
      • Transferred objects in S3
      • Image export
    • Amazon EC2
      • AMI and associated snapshot
      • Terminate the DataSync Agent instance
  • AWS Standard account
    • Delete DatasyncBlogDemoStack in AWS CloudFormation
    • S3 – Delete the imported image
    • EC2 – Delete the AMI and associated snapshot
    • SSM Parameter Store – Delete the parameter

If you used the Amazon S3 Data Transfer Hub solution in this walkthrough, refer to the steps in part one to cleanup those resources.

Conclusion

In this post, you learned how to use AWS DataSync to transfer files on self-managed NFS storage between the standard AWS partition and the AWS GovCloud (US) partition. While we transferred in one direction, the techniques can be applied for transfers the other way. Along with part one of this series, I hope you can use this approach to ease your operational burden when transferring data across partitions.

Read more technical walkthroughs for AWS for government:


Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.

Please take a few minutes to share insights regarding your experience with the AWS Public Sector Blog in this survey, and we’ll use feedback from the survey to create more content aligned with the preferences of our readers.

Brett Eisen

Brett Eisen

Brett is a solutions architect for the Department of Defense (DoD) at Amazon Web Services (AWS). Brett has over 15 years of IT experience supporting the DoD, Intelligence Community, and the Department of Homeland Security. His areas of interest are containers, serverless, and DevSecOps, and he enjoys helping customers solve their most complex challenges.