AWS Machine Learning Blog
Simplify Machine Learning Inference on Kubernetes with Amazon SageMaker Operators
Amazon SageMaker Operators for Kubernetes allows you to augment your existing Kubernetes cluster with SageMaker hosted endpoints.
Machine learning inferencing requires investment to create a reliable and efficient service. For an XGBoost model, developers have to create an application, such as through Flask that will load the model and then run the endpoint, which requires developers to think about queue management, faultless deployment, and reloading of newly trained models. The serving container then has to be pushed to a Docker repository, where Kubernetes can be configured to pull from, and deploy on the cluster. These steps require your data scientist to work on tasks unrelated to improving model accuracy, or bringing in a dev ops engineer, which adds to development schedules and requires more time to iterate.
With the SageMaker Operators, developers only need to write a yaml file that specifies the S3 stored locations of the saved models, and live predictions become available through a secure endpoint. Reconfiguring the endpoint is as simple as updating the yaml file. On top of being easy to use, the service also has the following features:
- Multi-model endpoint – Hosting dozens or more models can be challenging to configure and lead to many machines operating with low utilization. Multi-model endpoints sets up one instance with on the fly loading of model artifacts for serving
- Elastic Inference – Run your smaller workloads on split GPUs that you can deploy at low cost
- High Utilization & Dynamic Auto Scaling – Endpoints can run with 100% utilization and add replicas based on custom metrics you define, such as invocations per second. Alternatively, automatic scaling can be configured on predefined metrics for client performance
- Availability Zone Transfer – If there is an outage, Amazon SageMaker will automatically move your endpoint to another Availability Zone within your VPC
- A/B Testing – Set up multiple models, and direct traffic proportional to the amount that you set on a single endpoint
- Security – Endpoints are created with HTTPS and can be configured to be run in a private VPC (no internet egress) and accessed through AWS PrivateLink
- Compliance Ready – Amazon SageMaker has been certified compliant with HIPAA, PCI DSS, and SOC (1, 2, 3) rules and regulations
Packaged together, the features that are available in Kubernetes through SageMaker Operators shorten time to launch model serving, and reduce your development resources to setup and maintain production infrastructure. This can be a drop of 90% in total cost of ownership over EKS or EC2 alone.
This post demonstrates how to set up Amazon SageMaker Operators for Kubernetes to create and update endpoints for a pre-trained XGBoost model completely from kubectl
. The solution contains the following steps:
- Create an IAM Amazon SageMaker role, which gives Amazon SageMaker permissions needed to serve your model
- Prepare a YAML file that deploys your model to Amazon SageMaker
- Deploy your model to Amazon SageMaker
- Query the endpoint to obtain predictions
- Perform an eventually consistent update to the deployed model
Prerequisites
This post assumes you have the following prerequisites:
- A Kubernetes cluster
- The Amazon SageMaker Operators installed on your cluster
- An XGBoost model you can deploy
For information about installing the operator onto an Amazon EKS cluster, see Introducing Amazon SageMaker Operators for Kubernetes. You can bring your own XGBoost model, but this tutorial uses the existing model from the previously mentioned post.
Creating an Amazon SageMaker execution role
Amazon SageMaker needs an IAM role that it can assume to serve your model. If you do not have one already, create one with the following bash
code:
export assume_role_policy_document='{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {
"Service": "sagemaker.amazonaws.com"
},
"Action": "sts:AssumeRole"
}]
}'
aws iam create-role --role-name <execution role name> \
--assume-role-policy-document \
"$assume_role_policy_document"
aws iam attach-role-policy --role-name <execution role name> \
--policy-arn \
arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
Replace <execution role name> with a suitable role name. This creates an IAM role that Amazon SageMaker can use to assume when serving your model.
Preparing your hosting deployment
The operators provide a Custom Resource Definition (CRD) named HostingDeployment
. You use a HostingDeployment
to configure your model deployment on Amazon SageMaker Hosting.
To prepare your hosting deployment, create a file called hosting.yaml
with the following contents:
apiVersion: sagemaker.thinkwithwp.com/v1
kind: HostingDeployment
metadata:
name: hosting-deployment
spec:
region: us-east-2
productionVariants:
- variantName: AllTraffic
modelName: xgboost-model
initialInstanceCount: 1
instanceType: ml.r5.large
initialVariantWeight: 1
models:
- name: xgboost-model
executionRoleArn: SAGEMAKER_EXECUTION_ROLE_ARN
containers:
- containerHostname: xgboost
modelDataUrl: s3://BUCKET_NAME/model.tar.gz
image: 825641698319.dkr.ecr.us-east-2.amazonaws.com/xgboost:latest
Replace SAGEMAKER_EXECUTION_ROLE_ARN with the ARN of the execution role you created in the previous step. Replace BUCKET_NAME with the bucket that contains your model.
Make sure that the bucket Region, HostingDeployment
Region, and image ECR Region are all equivalent.
Deploying your model to Amazon SageMaker
You can now start the deployment by running kubectl apply -f hosting.yaml
. See the following code:
You can track deployment status with kubectl get hostingdeployments
. See the following code:
Your model endpoint may take up to fifteen minutes to be deployed. You can use the below command to view the status. The endpoint will be ready for queries once it achieves the InService status.
Querying the endpoint
After the endpoint is in service, you can test that it works with the following example code:
This bash
command connects to the HTTPS endpoint using AWS CLI. The model you created is based on the MNIST digit dataset, and your predictor reads what number is in the image. When you make this call, it sends an inference payload that contains 784 features in CSV format, which represent pixels in an image. You see the predicted number that the model believes is in the payload. See the following code:
This confirms that your endpoint is up and running.
Eventually consistent updates
After you deploy a model, you can make changes to the Kubernetes YAML and the operator updates the endpoint. The updates propagate to Amazon SageMaker in an eventually consistent way. This enables you to configure your endpoints declaratively and lets the operator handle the details.
To demonstrate this, you can change the instance type of the model from ml.r5.large
to ml.c5.2xlarge
. Complete the following steps:
- Modify the instance type in
hosting.yaml
to beml.c5.2xlarge
. See the following code:apiVersion: sagemaker.thinkwithwp.com/v1 kind: HostingDeployment metadata: name: hosting-deployment spec: region: us-east-2 productionVariants: - variantName: AllTraffic modelName: xgboost-model initialInstanceCount: 1 instanceType: ml.c5.2xlarge initialVariantWeight: 1 models: - name: xgboost-model executionRoleArn: SAGEMAKER_EXECUTION_ROLE_ARN containers: - containerHostname: xgboost modelDataUrl: s3://BUCKET_NAME/model.tar.gz image: 825641698319.dkr.ecr.us-east-2.amazonaws.com/xgboost:latest
- Apply the change to the Kubernetes cluster. See the following code:
- Get the status of the hosting deployment. It will show as
Updating
and then change toInService
when ready. See the following code:
The endpoint remains live and fully available throughout the update. For more information and additional examples, see the GitHub repo.
Cleaning up
To delete the endpoint, and not incur further usage charges, run kubectl delete -f hosting.yaml
. See the following code:
Conclusion
This post demonstrated how Amazon SageMaker Operators for Kubernetes supports real-time inference. It also supports training and hyperparameter tuning.
As always, please share your experience and feedback, or submit additional example YAML specs or operator improvements. You can share how you’re using Amazon SageMaker Operators for Kubernetes by posting on the AWS forum for Amazon SageMaker, creating issues in the GitHub repo, or sending it through your AWS Support contacts.
About the authors
Cade Daniel is a Software Development Engineer with AWS Deep Learning. He develops products that make training and serving DL/ML models more efficient and easy for customers. Outside of work, he enjoys practicing his Spanish and learning new hobbies.
Alex Chung is a Senior Product Manager with AWS in enterprise machine learning systems. His role is to make AWS MLOps products more accessible for Kubernetes machine learning custom environments. He’s passionate about accelerating ML adoption for a large body of users to solve global economic and societal problems. Outside machine learning, he is also a board member at a Silicon Valley nonprofit for donating stock to charity, Cocatalyst.org that optimizes donor tax benefits similar to donor advised funds.