AWS Machine Learning Blog

Category: AWS Lake Formation

solution__architecture

Governing the ML lifecycle at scale, Part 3: Setting up data governance at scale

This post dives deep into how to set up data governance at scale using Amazon DataZone for the data mesh. The data mesh is a modern approach to data management that decentralizes data ownership and treats data as a product. It enables different business units within an organization to create, share, and govern their own data assets, promoting self-service analytics and reducing the time required to convert data experiments into production-ready applications.

Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker

Customers of every size and industry are innovating on AWS by infusing machine learning (ML) into their products and services. Recent developments in generative AI models have further sped up the need of ML adoption across industries. However, implementing security, data privacy, and governance controls are still key challenges faced by customers when implementing ML […]

Use Amazon SageMaker Canvas to build machine learning models using Parquet data from Amazon Athena and AWS Lake Formation

Data is the foundation for machine learning (ML) algorithms. One of the most common formats for storing large amounts of data is Apache Parquet due to its compact and highly efficient format. This means that business analysts who want to extract insights from the large volumes of data in their data warehouse must frequently use […]

Apply fine-grained data access controls with AWS Lake Formation and Amazon EMR from Amazon SageMaker Studio

June 2023: This post was reviewed and updated to reflect the launch of EMR release 6.10 Amazon SageMaker Studio is a fully integrated development environment (IDE) for machine learning (ML) that enables data scientists and developers to perform every step of the ML workflow, from preparing data to building, training, tuning, and deploying models. Studio […]

Large-scale feature engineering with sensitive data protection using AWS Glue interactive sessions and Amazon SageMaker Studio

Organizations are using machine learning (ML) and AI services to enhance customer experience, reduce operational cost, and unlock new possibilities to improve business outcomes. Data underpins ML and AI use cases and is a strategic asset to an organization. As data is growing at an exponential rate, organizations are looking to set up an integrated, […]

Build and train ML models using a data mesh architecture on AWS: Part 2

This is the second part of a series that showcases the machine learning (ML) lifecycle with a data mesh design pattern for a large enterprise with multiple lines of business (LOBs) and a Center of Excellence (CoE) for analytics and ML. In part 1, we addressed the data steward persona and showcased a data mesh […]

Control access to Amazon SageMaker Feature Store offline using AWS Lake Formation

This post was last reviewed and updated June, 2022 with revised feature groups (tables) and features (columns) permissions. You can establish feature stores to provide a central repository for machine learning (ML) features that can be shared with data science teams across your organization for training, batch scoring, and real-time inference. Data science teams can […]

Enable cross-account access for Amazon SageMaker Data Wrangler using AWS Lake Formation

Amazon SageMaker Data Wrangler is the fastest and easiest way for data scientists to prepare data for machine learning (ML) applications. With Data Wrangler, you can simplify the process of feature engineering and complete each step of the data preparation workflow, including data selection, cleansing, exploration, and visualization through a single visual interface. Data Wrangler […]

For an existing data lake registered with Lake Formation, the following diagram illustrates the proposed implementation.

Control and audit data exploration activities with Amazon SageMaker Studio and AWS Lake Formation

May 2024: This post was reviewed and updated to use a new dataset, reflect the updated Studio experience and AWS IAM Identity Center. Certain industries are required to audit all access to their data. This includes auditing exploratory activities performed by data scientists, who usually query data from within machine learning (ML) notebooks. This post […]