AWS Storage Blog
Tag: Amazon Simple Storage Service (Amazon S3)
Using AWS DataSync to move data from Hadoop to Amazon S3
You want to leverage cloud scalability, increase cost efficiency by paying only for utilized storage, decouple big data storage from processing, and increase capabilities for data analytics and machine learning using AWS. But how do you move your Hadoop cluster? To accelerate this transition, AWS DataSync recently launched support for moving data between Hadoop Distributed […]
Automate and centrally manage data protection for Amazon S3 with AWS Backup
Customers globally, especially in regulated industries, require centralized protection and demonstrable compliance for their application data. Centralized data protection and enhanced visibility across backup operations can reduce the risks of costly disasters and accidents, improve business continuity, and simplify the auditing process. With AWS Backup for Amazon S3 now being generally available, you can centralize […]
A gene-editing prediction engine with iterative learning cycles built on AWS
NRGene develops cutting-edge genomic analytics products that are reshaping agriculture worldwide. Among our customers are some of the biggest and most sophisticated companies in seed-development, food and beverages, paper, rubber, cannabis, and more. In the middle of 2020, NRGene joined a consortium of companies and academic institutions to build the best-in-class gene-editing prediction platform to […]
MemQ by Pinterest: An efficient, scalable, cloud-native publish/subscribe system
The Logging Platform at Pinterest powers all data ingestion and transportation at Pinterest. At the heart of the Pinterest Logging Platform are distributed pub/sub systems that help our customers transport, buffer, and consume data asynchronously. Pub/sub messaging, is a form of asynchronous service-to-service communication used in serverless and microservices architectures. In a pub/sub model, any […]
How to securely share application log files with third parties
What do we do when our applications fail, and we must provide instance-level log data to external entities for troubleshooting purposes? It’s best to limit direct human interaction with our production resources, so we often see temporary access provided for a fixed period. For highly regulated industries, the approval process for production access can be […]
Modernizing NASCAR’s multi-PB media archive at speed with AWS Storage
The National Association for Stock Car Auto Racing (NASCAR) is the sanctioning body for the No. 1 form of motor sports in the United States, and owns 15 of the nation’s major motorsports entertainment facilities. About 15 years ago NASCAR began to collect all the video, audio, and image assets from over the last 70+ […]
Considering four different replication options for data in Amazon S3
UPDATE (2/10/2022): Amazon S3 Batch Replication, which is not covered in this blog post, launched on 2/8/2022, allowing you to replicate existing S3 objects and synchronize your S3 buckets. See the S3 User Guide for additional details. UPDATE (5/1/2023): Updated the comparison table to reflect the latest capabilities of the mechanisms covered in the table. […]
Best practices for archiving large datasets with AWS
As companies grow, they often find themselves managing an ever-increasing amount of data. Customers often need to retain backups for business continuity or disaster recovery, as well as records for compliance and audits. In addition, some customers may need to retain backups to create a centralized repository of information that is heterogeneous in nature, with […]
Monitor Amazon S3 activity using S3 server access logs and Pandas in Python
Monitoring and controlling access to data is often essential for security, cost optimization, and compliance. For these reasons, customers want to know what data of theirs is being accessed, when it is being accessed, and who is accessing it. With more data to monitor, large amounts of data can make it more challenging to granularly […]
Collecting, archiving, and retrieving surveillance footage with AWS
Video feeds and still images from judiciary locations are considered critical forms of evidence in the court of law. These locations can be police stations and government offices or even civil locations of importance like banks and hospitals. As governments, particularly in smart cities rely upon video surveillance, it is critical to design a cost […]