AWS Storage Blog
Spend less while increasing performance with Amazon FSx for Lustre data compression
Many customers associate a performance cost with data compression, but that’s not the case with Amazon FSx for Lustre. With FSx for Lustre, data compression reduces storage costs and increases aggregate file system throughput. As organizations continue to build applications faster than ever, the amount of data that organizations must store grows rapidly. Organizations need tools to help reduce the amount they spend storing this data. Data compression is an easy way to do just that. With the new data compression feature of Amazon FSx for Lustre, you can now improve performance when reading and writing to your file system while reducing your cost by using a smaller file system. This is a win-win across the board for most workloads. In this post, I’ll show how you can increase both read and write throughput to your Amazon FSx for Lustre file system while using a smaller-sized file system.
Amazon FSx for Lustre is a fully managed service that provides cost-effective, high-performance, scalable storage for compute workloads. It is powered by Lustre, the world’s most popular high-performance file system, and delivers sub-millisecond latencies, up to hundreds of gigabytes per second of throughput and millions of IOPS. It is designed to serve many types of workloads, including machine learning, high performance computing (HPC), video rendering, and financial simulations; in essence, any Linux-based, compute-heavy workload that needs high-performance shared storage.
As of May 2021, you can enable data compression on your Amazon FSx for Lustre file systems to reduce the storage consumption of both your file system storage and your file system backups. This feature is designed to deliver high levels of compression, and based on my testing I’ve found that it does this while delivering higher levels of throughput for read and write operations. This is achieved by automatically compressing the data before writing to disk and automatically uncompressing data after reading from disk. Due to this design, we’re able to read and write more data to disk in the same amount of time, thus increasing throughput and IOPS.
In this blog post, I walk you through the new Amazon FSx for Lustre data compression feature, share the results of a storage consumption and throughput test comparing compressed and uncompressed file systems, and list compression ratios for some common data types.
How data compression works
Each Amazon FSx for Lustre file system consists of Lustre file servers that Lustre clients communicate with, and disk storage attached to each file server. Each file server employs a high-performance network, fast in-memory cache storage, and either HDD-based or SSD-based disk storage. With the new optional data compression feature, Amazon FSx for Lustre file servers are now capable of compression using the community-trusted and performance-oriented LZ4 algorithm, designed to deliver high levels of compression without impacting performance. Compression sits between the in-memory cache storage and disk storage. Data is compressed before writing to disk and uncompressed after reading from disk. These components are illustrated in the following diagram (Figure 1).
Figure 1: performance components of an Amazon FSx for Lustre file system
A key attribute of Amazon FSx for Lustre is the throughput of a file system is proportional to its storage capacity – the larger the storage capacity the higher the throughput. This is achieved because the number of Lustre file servers, as illustrated in Figure 1, is also proportional to the storage capacity. The more storage capacity or disk storage you allocate to your file system, either at creation or during a storage capacity scaling event (for example, increasing the storage capacity), the more high-performance network, in-memory cache storage, compression engines, and disk storage is associated with your file system. To better understand how these components are sized based on the storage capacity of the file system, please refer to the aggregate file system performance tables in the Amazon FSx for Lustre user guide. What isn’t depicted in these tables is the impact compression will have on your performance. Data compression ratios measure the reduction in size typically expressed as the division of uncompressed size by compressed size (for example, 2:1). Space savings are also commonly used, which is the reduction in size relative to the uncompressed size (for example, 1 – (compressed size ÷ uncompressed size)), typically expressed as a percentage (for example, 50%). The amount of compression achieved is highly dependent on the type of data being compressed.
How I tested
First, I create two Amazon FSx for Lustre file systems. One with compression enabled, named LZ4, and one without compression enabled, named NONE. Each file system is configured with persistent SSD storage with a throughput per unit of storage of 200 MB/s per TiB (up to 1.3 GB/s per TiB burst) and a storage capacity of 14.4 TiB. Based on the storage capacity and throughput per unit of storage, each file system is capable of achieving 2880 MB/s of baseline throughput and 18,720 MB/s of burst throughput.
Second, I launch eight c5n.9xlarge Amazon EC2 instances using the latest Amazon Linux 2 Amazon Machine Image (AMI). I purposely selected this instance type because of its non-variable network performance so I could ensure that EC2 performance isn’t a limiting factor of my test. I need consistent network performance from Amazon EC2 to the file system.
The user data script for all Amazon EC2 instances installs the latest versions of Open Message Passing Interface (OpenMPI) and IOR, along with AWS CLI v2 and two monitoring tools, nload and ncdu. The following script is an example of the user data script being used to launch these instances.
#cloud-config
repo_update: true
repo_upgrade: all
runcmd:
- amazon-linux-extras install -y epel lustre2.10
- yum install -y nload ncdu gcc gcc-c++
# install and configure aws cli v2
- cd /home/ec2-user
- curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
- unzip awscliv2.zip
- ./aws/install -i /usr/local/aws-cli -b /usr/local/bin
- sudo export PATH=/usr/local/bin:$PATH
- availability_zone=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
- region=${!availability_zone:0:-1}
- echo -e "\n\n${!region}\ntext\n" | aws configure
- cd
# install openmpi
- cd /home/ec2-user
- wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.1.tar.gz
- tar xvzf openmpi-4.1.1.tar.gz
- cd openmpi-4.1.1
- ./configure
- make
- make install
- cd
# install ior
- cd /home/ec2-user
- wget https://github.com/hpc/ior/releases/download/3.3.0/ior-3.3.0.tar.gz
- tar xvzf ior-3.3.0.tar.gz
- cd ior-3.3.0
- ./configure
- make
- make install
- cd
Third, I make the directory /fsx on all instances and mount the file system with no data compression (NONE) on the first four instances, and mount the file system with data compression (LZ4) on the last four instances. This allows me to run the tests in parallel, with the first four instances mounting and writing to the NONE file system while the last four instances mounting and writing to the LZ4 file system. The IOR write operation will use multiple threads to generate and write to 12-GB files, so I configure Lustre to stripe each file across all Object Storage Targets (OSTs). This optimizes file system access for my test, as each Amazon EC2 instance can access and write to all OSTs in parallel and the data will be evenly distributed across all OSTs. The following script is an example of the script being used to complete this step.
sudo mkdir -p /fsx/
sudo mount -t lustre -o noatime,flock,<fs-id>.fsx.<region>.amazonaws.com@tcp:/<mountname> /fsx
sudo chown ec2-user:ec2-user /fsx
lfs setstripe --stripe-count -1 /fsx
Fourth, using OpenMPI and IOR, I continuously write to the Amazon FSx for Lustre file systems for 12 hours. Each instance creates 144 files and writes 12 GB of data to each file. This works out to be 576 files and 6912 GB of data continuously writing to the file system for 12 hours. The following script is an example of the script being used to complete this step.
instance_id=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
mpirun --npernode 144 --oversubscribe ior --posix.odirect -t 1m -b 1m -s 12000 -g -v -w -i 1000 -F -k -D 0 -T 720 -o /fsx/${instance_id}-ior.bin
Results
The data compression results are impressive, both in terms of storage savings (compression ratio) in addition to aggregate throughput. Each 12 GB of data written to the uncompressed file system (NONE) consumes 12 GB of storage. I find this out by running the following command on an instance that has the uncompressed file system (NONE) mounted. The command du -sh <file>
returns the amount of disk usage (compressed) in a readable format while du --apparent-size -sh <file>
returns the apparent disk usage (uncompressed) in a readable format.
instance_id=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
_file=/fsx/${instance_id}-ior.bin.00000000
du -sh ${_file}
du --apparent-size -sh ${_file}
This returns the disk usage and apparent disk usage of the file created and written to by thread 0.
[ec2-user@ip-172-31-14-163 fsx]$ instance_id=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
[ec2-user@ip-172-31-14-163 fsx]$ _file=/fsx/${instance_id}-ior.bin.00000000
[ec2-user@ip-172-31-14-163 fsx]$ du -sh ${_file}
12G /fsx/i-0ecbd30817e5c8e26-ior.bin.00000000
[ec2-user@ip-172-31-14-163 fsx]$ du --apparent-size -sh ${_file}
12G /fsx/i-0ecbd30817e5c8e26-ior.bin.00000000
Because data compression is not enabled on this file system, it makes sense that the disk usage and apparent disk usage for all these files is the same, 12 GB each.
Now I run the same command from an instance that has the file system with data compression enabled (LZ4).
[ec2-user@ip-172-31-6-87 fsx]$ instance_id=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
[ec2-user@ip-172-31-6-87 fsx]$ _file=/fsx/${instance_id}-ior.bin.00000000
[ec2-user@ip-172-31-6-87 fsx]$ du -sh ${_file}
3.2G /fsx/i-0183fe888b20813ba-ior.bin.00000000
[ec2-user@ip-172-31-6-87 fsx]$ du --apparent-size -sh ${_file}
12G /fsx/i-0183fe888b20813ba-ior.bin.00000000
The apparent disk usage (uncompressed size) for this file is the same – 12 GB, but the actual disk usage (compressed size) is only 3.2 GB. That’s a space savings of 73.33% or a data compression ratio of 3.75:1. The entire test dataset of 576 files with 12 GB of data written to each file has a total apparent disk usage of 6912 GB. However, with data compression enabled, it only consumes 1843.2 GB of disk space. This is a huge cost savings feature, as it allows you to create a file system with less storage capacity.
While the space savings alone are a huge win when data compression is enabled, how is file system performance or throughput impacted? If the data being accessed can be compressed, the aggregate throughput of the file system with data compression enabled (LZ4) had a significant improvement over the file system with no data compression (NONE).
The throughput of the NONE file system was consistent at 3560 MB/s for the entire 12-hour test, which is considerably more than our documented throughput of 2880 MB/s for that size file system. However, the throughput of the LZ4 file system blew the NONE file system out of the water. For the first eight hours, the LZ4 file system achieved 12,460-MB/s aggregate throughput. This level of throughput was consuming burst network throughput of the file system, which has a burst network throughput capacity of 1300 MB/s per TiB of storage or 18,720 MB/s for this 14.4-TiB file system. After the burst network throughput capacity is fully consumed, throughput decreases to the baseline network throughput of 750 MB/s per TiB of storage or 10,800 MB/s for this 14.4-TiB file system. During this part of the test, the LZ4 file system achieved 11,428-MB/s throughput, slightly higher than the documented baseline network throughput, after burning through its burst network throughput capacity. If I look at the throughput improvement and compression ratio between the NONE and LZ4 file systems, I achieved ~71.43% throughput improvement and a compression ratio of ~3.50:1 while consuming burst network throughput. For the remainder of the test, I achieved ~68.85% throughput improvement and a compression ratio of ~3.21:1.
The following table (Table 1) shares the different performance characteristics of the test file systems and extrapolates out the potential disk throughput increase based on the data compression ratio of the IOR test files.
14.4 TiB Persistent_1 SSD 200 MB/s per TiB | Network throughput based on user guide (MB/s)* |
Disk throughput based on user guide (MB/s)* |
Data compression ratio (n:1) |
Calculated disk throughput based on data compression ratio (MB/s) | Actual disk throughput (MB/s) | ||||
Compression | Baseline | Burst | Baseline | Burst | Baseline | Burst | Baseline | Burst | |
NONE | 10800 | 18720 | 2880 | 3456 | N/A | 2880 | 3456 | 3560 | 3560 |
LZ4 | 10800 | 18720 | 2880 | 3456 | 3.75 | 10800 | 12960 | 11428 | 12460 |
*Amazon FSx for Lustre user guide aggregate file system performance section
Table 1: Throughput comparison between none-compressed and LZ4-compressed Amazon FSx for Lustre file systems
The following figure (Figure 2) visually compares the total aggregate throughput of the NONE and LZ4 file systems during the test.
Figure 2: Throughput comparison between none-compressed and LZ4-compressed Amazon FSx for Lustre file systems
Disk latencies captured during the IOR write operations also show no measurable difference between the NONE and LZ4 file systems.
To better understand the documented burst and baseline network throughput characteristics of Amazon FSx, please refer to the aggregate file system performance tables in the Amazon FSx for Lustre user guide.
The amount of storage savings and throughput increase during read and write operations all depends on the attainable compression ratio for that data type. So as the saying goes, “not all data types are created equal.” Some data types will have higher compression ratios, resulting in more storage savings and increase in throughput. While other data types will have a low or no compression ratio, resulting is little or no storage savings or throughput increase. The following tables (Table 2 and Table 3) show compression ratios and space savings of a few sample files from the Open Data on AWS registry and other open-source files. Each sample file has a link to the source.
Sample files (compressible data types) |
Data type | Uncompressed size ( (MiB) |
Compressed size ( (MiB) |
Compression ratio (n:1) | Space savings |
agent_df_base_com_al_revised.pkl | .pkl |
21.10 |
1.19 | 17.68 |
94.34% |
dgen_db.sql | .sql |
2,583.63 |
305.21 | 8.47 | 88.19% |
1000genomes-dragen__dragen-3.5.7b__hg38_altaware_nohla__germline.json | .json |
37.60 |
4.85 | 7.75 | 87.09% |
SARS2.peptides.faa | .faa |
4.58 |
0.69 |
6.68 |
85.03% |
ERR4082713.realign | .realign |
6.50 |
1.19 |
5.48 |
81.77% |
1979.csv | .csv |
109.97 |
24.13 |
4.56 |
78.06% |
abetow-ERD2018-EBIRD_SCIENCE-20191109-a5cf4cb2_test-data.csv | .csv |
770.18 |
172.76 |
4.46 |
77.57% |
geo10apr15a.n18-VI3g | .n18-VI3g |
17.80 |
4.13 |
4.31 |
76.81% |
AVHRRBUVI01.1981auga.abf | .abf |
8.90 |
2.08 |
4.27 |
76.58% |
geo81aug15a.n07-VI3g | .n07-VI3g |
17.80 |
4.54 |
3.92 |
74.46% |
nex-gddp-s3-files.json | .json |
4.15 |
1.25 | 3.31 | 69.80% |
wtk_conus_2014_0m.h5 | .h5 |
1,014,508.26 |
313,181.87 | 3.24 |
69.13% |
ladi_machine_labels.pgsql | .pgsql |
4,656.76 |
1,455.39 |
3.20 |
68.75% |
ladi_machine_labels.csv | .csv |
4,656.76 |
1,455.42 | 3.20 |
68.75% |
SARS2.contigs.fna | .fna |
7.12 |
2.31 | 3.09 |
67.59% |
A_Synthetic_Building_Operation_Dataset.h5 | .h5 |
1,239,249.29 |
434,270.93 | 2.85 |
64.96% |
abetow-ERD2018-EBIRD_SCIENCE-20191109-a5cf4cb2_srd_raster_template.tif | .tif |
7.28 |
2.83 | 2.58 |
61.18% |
train_AOI_4_Shanghai_geojson_roads_speed_wkt_weighted_raw.csv | .csv |
10.47 |
4.35 |
2.41 |
58.49% |
HG00096.tn.tsv | .tsv |
175.17 |
79.37 | 2.21 |
54.69% |
summary.tsv | .tsv |
55.97 |
26.43 | 2.12 |
52.78% |
1120.las | .las |
1,179.03 |
695.84 |
1.69 |
40.98% |
M_R2_TGACCA_L006_R1_001.fastq.1 | .fastq |
10,047.73 |
5,980.53 |
1.68 |
40.48% |
Poseidon_i1000-3600_x900-3200.sgy | .sgy |
31,643.19 |
18,967.74 | 1.67 |
40.06% |
G26243.HT-1197.2.bam.bai | .bai |
5.25 |
3.19 |
1.64 |
39.19% |
videos_content_type_mp4.txt | .txt |
34.62 |
23.98 |
1.44 |
30.73% |
00README_V01.pdf | |
0.33 |
0.23 |
1.43 |
29.99% |
pi.txt | .txt |
16.04 |
11.79 | 1.36 |
26.54% |
GBQYVRF01.sff | .sff |
1,496.16 |
1,157.40 |
1.29 |
22.64% |
HG00371.bam.bai | .bai |
9.07 |
7.42 | 1.22 |
18.19% |
dgen_db.sql.zip | .zip |
13,851.45 |
12,649.00 |
1.10 |
8.68% |
RS1_A0631053_SCWA_20130301_093649_HH_SCW01F.tif | .tif |
204.15 |
193.84 |
1.05 |
5.05% |
batch0.fast5 | .fast5 |
921.93 |
898.15 | 1.03 | 2.58% |
Table 2: Sample files with compressible data types
Sample files (uncompressible data types) |
Data type | Uncompressed size ( (MiB) |
Compressed size ( (MiB) |
Compression ratio (n:1) | Space savings |
LE70010122011224ASN00.tar.gz | .tar.gz | 115.93 | 115.89 | 1.00 | 0.03% |
part-00039-f9394a8f-504e-4ee2-bff7-80ca622ce471.c000.snappy.parquet | .snappy. parquet |
407.09 | 406.99 | 1.00 | 0.02% |
NA12878.final.cram | .cram | 15,065.37 | 15,073.26 | 1.00 | -0.05% |
HG00096.bam | .bam | 46.53 | 46.56 | 1.00 | -0.07% |
ERR194147_1.fastq.gz | .fastq.gz | 49,020.85 | 49,053.95 | 1.00 | -0.07% |
fold0_best.pth | .pth | 93.64 | 93.80 | 1.00 | -0.18% |
regridded_1deg_pr_amon_access1-0_historical_r1i1p1_195001-200512.nc | .nc | 128.12 | 128.35 | 1.00 | -0.18% |
HG00371.hard-filtered.baf.bw | .bw | 46.35 | 46.45 | 1.00 | -0.22% |
part-00000.bz2 | .bz2 | 53.32 | 53.52 | 1.00 | -0.37% |
0019_20180815_212501_KM_EM122.wcd | .wcd | 40.37 | 40.57 | 1.00 | -0.49% |
013b2841e741e739f58f1050af19fe.mp4 | .mp4 | 5.20 | 5.28 | 0.98 | -1.53% |
_DSC0015_c9d7331a-f2f6-4efb-a279-f7981b42fd4a.jpg | .jpg | 7.64 | 7.77 | 0.98 | -1.73% |
USGS_Nepal_05272015-A-s0088.png | .png | 4.60 | 4.69 | 0.98 | -1.92% |
L0001-D20030818-T195819-EK60.raw | .raw | 4.01 | 4.09 | 0.98 | -1.94% |
Table 3: Sample files with uncompressible data types
Summary
Enabling data compression for new and existing FSx for Lustre file systems is highly recommended and should be used by all customers if the type of data being accessed from FSx for Lustre benefits from LZ4 data compression. This allows you to create FSx for Lustre file systems with less storage capacity (saving you money) and less per unit of throughput (also saving you money). It will also allow you to drive more throughput against the file system, reducing the amount of time it takes to process data and the amount of compute resources you’re consuming processing the data. Enabling data compression will help you spend less while increasing performance with your FSx for Lustre file systems.
To learn more about Amazon FSx for Lustre data compression, visit the Amazon FSx for Lustre user guide.
Thanks for reading this blog post and please leave any questions or comments in the comments section!