Accelerate development of ML workflows with Amazon Q Developer in Amazon SageMaker Studio

Machine learning (ML) projects are inherently complex, involving multiple intricate steps—from data collection and preprocessing to model building, deployment, and maintenance. Data scientists face numerous challenges throughout this process, such as selecting appropriate tools, needing step-by-step instructions with code samples, and troubleshooting errors and issues. These iterative challenges can hinder progress and slow down projects. Fortunately, generative AI-powered developer assistants like Amazon Q Developer have emerged to help data scientists streamline their workflows and fast-track ML projects, allowing them to save time and focus on strategic initiatives and innovation.

Amazon Q Developer is fully integrated with Amazon SageMaker Studio, an integrated development environment (IDE) that provides a single web-based interface for managing all stages of ML development. You can use this natural language assistant from your SageMaker Studio notebook to get personalized assistance using natural language. It offers tool recommendations, step-by-step guidance, code generation, and troubleshooting support. This integration simplifies your ML workflow and helps you efficiently build, train, and deploy ML models without needing to leave SageMaker Studio to search for additional resources or documentation.

In this post, we present a real-world use case analyzing the Diabetes 130-US hospitals dataset to develop an ML model that predicts the likelihood of readmission after discharge. Throughout this exercise, you use Amazon Q Developer in SageMaker Studio for various stages of the development lifecycle and experience firsthand how this natural language assistant can help even the most experienced data scientists or ML engineers streamline the development process and accelerate time-to-value.

Solution overview

If you’re an AWS Identity and Access Management (IAM) and AWS IAM Identity Center user, you can use your Amazon Q Developer Pro tier subscription within Amazon SageMaker. Administrators can subscribe users to the Pro Tier on the Amazon Q Developer console, enable Pro Tier in the SageMaker domain settings, and provide the Amazon Q Developer profile Amazon Resource Name (ARN). The Pro Tier offers unlimited chat and inline code suggestions. Refer to Set up Amazon Q Developer for your users for detailed instructions.

If you don’t have a Pro Tier subscription but want to try out the capability, you can access the Amazon Q Developer Free Tier by adding the relevant policies to your SageMaker service roles. Admins can navigate to the IAM console, search for the SageMaker Studio role, and add the policy outlined in Set up Amazon Q Developer for your users. The Free Tier is available for both IAM and IAM Identity Center users.

To start our ML project predicting the probability of readmission for diabetes patients, you need to download the Diabetes 130-US hospitals dataset. This dataset contains 10 years (1999–2008) of clinical care data at 130 US hospitals and integrated delivery networks. Each row represents hospital records of patients diagnosed with diabetes, who underwent laboratory, and more.

At the time of writing, Amazon Q Developer support in SageMaker Studio is only available in JupyterLab spaces. Amazon Q Developer is not supported for shared spaces.

Amazon Q Developer chat

After you have uploaded the data to SageMaker Studio, you can start working on your ML problem of reducing readmission rates for diabetes patients. Begin by using the chat capability next to your JupyterLab notebook. You can ask questions like generating code to parse the Diabetes 130-US hospitals data, how you should formulate this ML problem, and develop a plan to build an ML model that predicts the likelihood of readmission after discharge. Amazon Q Developer uses AI to provide code recommendations, and this is non-deterministic. The results you get may be different from the ones shown in the following screenshot.

Amazon Q Developer SageMaker Studio integration

You can ask Amazon Q Developer to help you plan out the ML project. In this case, we want the assistant to show us how to train a random forest classifier using the Diabetes 130-US dataset. Enter the following prompt into the chat, and Amazon Q Developer will generate a plan. If code is generated, you can use the UI to directly insert the code into your notebook.

I have diabetic_data.csv file containing training data about whether a diabetic patient was readmitted after discharge. I want to use this data to train a random forest classifier using scikit-learn. Can you list out the steps to build this model?

You can ask Amazon Q Developer to help you generate code for specific tasks by inserting the following prompt:

Create a function that takes in a pandas DataFrame and performs one-hot encoding for the gender, race, A1Cresult, and max_glu_serum columns.

You can also ask Amazon Q Developer to explain existing code and troubleshoot for common errors. Just choose the cell with the error and enter /fix in the chat.

The following is a full list of the shortcut commands:

/help – Display this help message
/fix – Fix an error cell selected in your notebook
/clear – Clear the chat window
/export – Export chat history to a Markdown file

To get the most out of your Amazon Q Developer chat, the following best practices are recommended when crafting your prompt:

Be direct and specific – Ask precise questions. For instance, instead of a vague query about AWS services, try: “Can you provide sample code using the SageMaker Python SDK library to train an XGBoost model in SageMaker?” Specificity helps the assistant understand exactly what you need, resulting in more accurate and useful responses.
Provide contextual information – The more context you offer, the better. This allows Amazon Q Developer to tailor its responses to your specific situation. For example, don’t just ask for code to prepare data. Instead, provide the first three rows of your data to get better code suggestions with fewer changes needed.
Avoid sensitive topics – Amazon Q Developer is designed with guardrail controls. It’s best to avoid questions related to security, billing information of your account, or other sensitive subjects.

Following these guidelines can help you maximize the value of Amazon Q Developer’s AI-powered code recommendations and streamline your ML projects.

Amazon Q Developer inline code suggestions

You can also get real-time code suggestions as you type in the JupyterLab notebook, offering context-aware recommendations based on your existing code and comments to streamline the coding process. In the following example, we demonstrate how to use the inline code suggestions feature to generate code blocks for various data science tasks: from data exploration to feature engineering, training a random forest model, evaluating the model, and finally deploying the model to predict the probability of readmission for diabetes patients.

The following figure shows the list of keyboard shortcuts to interact with Amazon Q Developer.

Let’s start with data exploration.

We first import some of the necessary Python libraries, like pandas and NumPy. Add the following code into the first code cell of Jupyter Notebook, and then run the cell:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In the next code cell, add the following comment, and before running the cell, press Enter and Tab. You can watch the bottom status bar to see Amazon Q Developer working to generate code suggestions.

# read 'diabetic-readmission.csv'

You can also ask Amazon Q Developer to create a visualization:

# create a bar chart from df that shows counts of patients by 'race' and 'gender' with a title of 'patients by race and gender'

Now you can perform feature engineering to prepare the model for training.

The dataset provided has a number of categorical features, which need to be converted to numerical features, as well as missing data. In the next code cell, add the following comment, and press TAB to see how Amazon Q Developer can help:

# perform one-hot encoding for gender, race, a1c_result, and max_glu_serum columns

Lastly, you can use Amazon Q Developer to help you create a simple ML model, random forest classifier, using scikit-learn.

Amazon Q Developer in SageMaker data policy

When using Amazon Q Developer in SageMaker Studio, no customer content is used for service improvement, regardless of whether you use the Free Tier or Pro Tier. For IDE-level telemetry sharing, Amazon Q Developer may track your usage of the service, such as how many questions you ask and whether you accept or reject a recommendation. This information doesn’t contain customer content or personally identifiable information, such as your IP address. If you prefer to opt out of IDE-level telemetry, complete the following steps to opt out of sharing usage data with Amazon Q Developer:

On the Settings menu, choose Settings Editor.

Amazon Q Developer settings editor

Uncheck the option Share usage data with Amazon Q Developer.

Amazon Q Developer data usage policy

Alternatively, an ML platform admin can disable this option for all users inside JupyterLab by default with the help of lifecycle configuration scripts. To learn more, see Using lifecycle configurations with JupyterLab. To disable data sharing with Amazon Q Developer by default for all users within a SageMaker Studio domain, complete the following steps:

On the SageMaker console, choose Lifecycle configurations under Admin configurations in the navigation pane.
Choose Create configuration.

Amazon SageMaker lifecycle configuration

For Name, enter a name.
In the Scripts section, create a lifecycle configuration script that disables the shareCodeWhispererContentWithAWS settings flag for the jupyterlab-q extension:

#!/bin/bash
mkdir -p /home/sagemaker-user/.jupyter/lab/user-settings/amazon-q-developer-jupyterlab-ext/
cat<<EOL> /home/sagemaker-user/.jupyter/lab/user-settings/amazon-q-developer-jupyterlab-ext/completer.jupyterlab-settings
{
"shareCodeWhispererContentWithAWS": false,   
"suggestionsWithCodeReferences": true,   
"codeWhispererTelemetry": false,
"codeWhispererLogLevel": "ERROR"
}
EOL

Amazon SageMaker lifecycle configuration script

Attach the disable-q-data-sharing lifecycle configuration to a domain.
Optionally, you can force the lifecycle configuration to run with the Run by default

Attach lifecycle configuration

Use this lifecycle configuration when creating a JupyterLab space.

It will be selected by default if the configuration is set to Run by default.

Lifecycle configuration script run by default Jupyter space

The configuration should run almost instantaneously and disable the Share usage data with Amazon Q Developer option in your JupyterLab space on startup.

Disable share data usage

Clean up

To avoid incurring AWS charges after testing this solution, delete the SageMaker Studio domain.

Conclusion

In this post, we walked through a real-world use case and developed an ML model that predicts the likelihood of readmission after discharge for patients in the Diabetes 130-US hospitals dataset. Throughout this exercise, we used Amazon Q Developer in SageMaker Studio for various stages of the development lifecycle, demonstrating how this developer assistant can help streamline the development process and accelerate time-to-value, even for experienced ML practitioners. You have access to Amazon Q Developer in all AWS Regions where SageMaker is generally available. Get started with Amazon Q Developer in SageMaker Studio today to access the generative AI–powered assistant.

The assistant is available for all Amazon Q Developer Pro and Free Tier users. For pricing information, see Amazon Q Developer pricing.

About the Authors

James Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.

Lauren Mullennex is a Senior AI/ML Specialist Solutions Architect at AWS. She has a decade of experience in DevOps, infrastructure, and ML. Her areas of focus include computer vision, MLOps/LLMOps, and generative AI.

Shibin Michaelraj is a Sr. Product Manager with the Amazon SageMaker team. He is focused on building AI/ML-based products for AWS customers.

Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes using state of the art ML techniques. In his free time, he enjoys playing chess and traveling. You can find Pranav on LinkedIn.

Bhadrinath Pani is a Software Development Engineer at Amazon Web Services, working on Amazon SageMaker interactive ML products, with over 12 years of experience in software development across domains like automotive, IoT, AR/VR, and computer vision. Currently, his main focus is on developing machine learning tools aimed at simplifying the experience for data scientists. In his free time, he enjoys spending time with his family and exploring the beauty of the Pacific Northwest.

AWS Machine Learning Blog