AWS Partner Network (APN) Blog
Category: Analytics
How SnapLogic eXtreme Helps Visualize Spark ETL Pipelines on Amazon EMR
Fully managed cloud services enable global enterprises to focus on strategic differentiators versus maintaining infrastructure. They do this by creating data lakes and performing big data processing in the cloud. SnapLogic eXtreme allows citizen integrators, those who can’t code, and data integrators to efficiently support and augment data-integration use cases by performing complex transformations on large volumes of data. Learn how to set up SnapLogic eXtreme and use Amazon EMR to do Amazon Redshift ETL.
Implementing SAML AuthN for Amazon EMR Using Okta and Column-Level AuthZ with AWS Lake Formation
As organizations continue to build data lakes on AWS and adopt Amazon EMR, especially when consuming data at enterprise scale, it’s critical to govern your data lakes by establishing federated access and having fine-grained controls to access your data. Learn how to implement SAML-based authentication (AuthN) using Okta for Amazon EMR, querying data using Zeppelin notebooks, and applying column-level authorization (AuthZ) using AWS Lake Formation.
Cognitive Document Processing and Data Extraction for the Oil and Gas Industry
The oil and gas industry is highly complex and churns out copious amounts of data from sensors and machines at every stage in their business value chain. This post analyzes the role of machine learning for document extraction in the oil and gas industry for better business operations. Learn about Quantiphi’s document processing solution built on AWS, and how it helped a Canadian oil and gas organization address document management challenges through AI and ML techniques.
Monitor Your Migration to AWS Graviton2-Powered Amazon EC2 Instances with Datadog
If you’re thinking about shifting existing workloads to an AWS Gravitron2-powered Amazon EC2 instance, Datadog can help you monitor your migration and get insight into your entire AWS infrastructure. Install the Datadog Agent, open-source software available on GitHub, to collect infrastructure metrics, distributed traces, logs, and more from your Amazon EC2 instances. With Datadog Amazon EC2 integration, you can monitor even more of your AWS infrastructure, complementing the data collected by the Agent.
Change Data Capture from On-Premises SQL Server to Amazon Redshift Target
Change Data Capture (CDC) is the technique of systematically tracking incremental change in data at the source, and subsequently applying these changes at the target to maintain synchronization. You can implement CDC in diverse scenarios using a variety of tools and technologies. Here, Cognizant uses a hypothetical retailer with a customer loyalty program to demonstrate how CDC can synchronize incremental changes in customer activity with the main body of data already stored about a customer.
How to Proactively Monitor Amazon RDS Performance with Datadog
To proactively identify and remediate potential errors, you need deep visibility into your entire Amazon RDS environment. This post shows you how Datadog can fetch data from Amazon CloudWatch and your Amazon RDS database instances to give you a comprehensive view of your cloud environment. We also dive into how you can automatically detect performance anomalies, abnormal throughput behavior, and forecasting storage capacities.
How TIBCO Leverages AWS for its COVID-19 Analytics App
TIBCO Software has launched an analytics app to track the spread and impact of the COVID-19 pandemic in real-time, over local regions worldwide. The goal of this analytics app is to enable organizations to assess the potential impact of the COVID-19 pandemic on their business fabric, using sound data science and data management principles, in the context of real-time operations. Learn some of key capabilities of the app and how it was developed on AWS.
Best Practices from Onica for Optimizing Query Performance on Amazon Redshift
Effective and economical use of data is critical to your success. As data volumes increase exponentially, managing and extracting value from data becomes increasingly difficult. By adopting best practices that Onica has developed over years of using Amazon Redshift, you can improve the performance of your AWS data warehouse implementation. Onica has completed multiple projects ranging from assessing the current state of an Amazon Redshift cluster to helping tune, optimize, and deploy new clusters.
Training Multiple Machine Learning Models Simultaneously Using Spark and Apache Arrow
Spark is a distributed computing framework that added new features like Pandas UDF by using PyArrow. You can leverage Spark for distributed and advanced machine learning model lifecycle capabilities to build massive-scale products with a bunch of models in production. Learn how Perion Network implemented a model lifecycle capability to distribute the training and testing stages with few lines of PySpark code. This capability improved the performance and accuracy of Perion’s ML models.
Analyzing COVID-19 Data with AWS Data Exchange, Amazon Redshift, and Tableau
To help everyone visualize COVID-19 data confidently and responsibly, we brought together APN Partners Salesforce, Tableau, and MuleSoft to create a centralized repository of trusted data from open source COVID-19 data providers. Anyone can work with the public data, blend it with their own data, or subscribe to the source datasets directly through AWS Data Exchange, and then use Amazon Redshift together with Tableau to better understand the impact on their organization.