AWS Machine Learning Blog
Category: Artificial Intelligence
Speed up your AI inference workloads with new NVIDIA-powered capabilities in Amazon SageMaker
At re:Invent 2024, we are excited to announce new capabilities to speed up your AI inference workloads with NVIDIA accelerated computing and software offerings on Amazon SageMaker. In this post, we will explore how you can use these new capabilities to enhance your AI inference on Amazon SageMaker. We’ll walk through the process of deploying NVIDIA NIM microservices from AWS Marketplace for SageMaker Inference. We’ll then dive into NVIDIA’s model offerings on SageMaker JumpStart, showcasing how to access and deploy the Nemotron-4 model directly in the JumpStart interface. This will include step-by-step instructions on how to find the Nemotron-4 model in the JumpStart catalog, select it for your use case, and deploy it with a few clicks.
Unlock cost savings with the new scale down to zero feature in SageMaker Inference
Today at AWS re:Invent 2024, we are excited to announce a new feature for Amazon SageMaker inference endpoints: the ability to scale SageMaker inference endpoints to zero instances. This long-awaited capability is a game changer for our customers using the power of AI and machine learning (ML) inference in the cloud.
Supercharge your auto scaling for generative AI inference – Introducing Container Caching in SageMaker Inference
Today at AWS re:Invent 2024, we are excited to announce the new Container Caching capability in Amazon SageMaker, which significantly reduces the time required to scale generative AI models for inference. This innovation allows you to scale your models faster, observing up to 56% reduction in latency when scaling a new model copy and up to 30% when adding a model copy on a new instance. In this post, we explore the new Container Caching feature for SageMaker inference, addressing the challenges of deploying and scaling large language models (LLMs).
Introducing Fast Model Loader in SageMaker Inference: Accelerate autoscaling for your Large Language Models (LLMs) – part 1
Today at AWS re:Invent 2024, we are excited to announce a new capability in Amazon SageMaker Inference that significantly reduces the time required to deploy and scale LLMs for inference using LMI: Fast Model Loader. In this post, we delve into the technical details of Fast Model Loader, explore its integration with existing SageMaker workflows, discuss how you can get started with this powerful new feature, and share customer success stories.
Introducing Fast Model Loader in SageMaker Inference: Accelerate autoscaling for your Large Language Models (LLMs) – Part 2
In this post, we provide a detailed, hands-on guide to implementing Fast Model Loader in your LLM deployments. We explore two approaches: using the SageMaker Python SDK for programmatic implementation, and using the Amazon SageMaker Studio UI for a more visual, interactive experience. Whether you’re a developer who prefers working with code or someone who favors a graphical interface, you’ll learn how to take advantage of this powerful feature to accelerate your LLM deployments.
Fast and accurate zero-shot forecasting with Chronos-Bolt and AutoGluon
Chronos models are available for Amazon SageMaker customers through AutoGluon-TimeSeries and Amazon SageMaker JumpStart. In this post, we introduce Chronos-Bolt, our latest FM for forecasting that has been integrated into AutoGluon-TimeSeries.
How Amazon Finance Automation built a generative AI Q&A chat assistant using Amazon Bedrock
Amazon Finance Automation developed a large language model (LLM)-based question-answer chat assistant on Amazon Bedrock. This solution empowers analysts to rapidly retrieve answers to customer queries, generating prompt responses within the same communication thread. As a result, it drastically reduces the time required to address customer queries. In this post, we share how Amazon Finance Automation built this generative AI Q&A chat assistant using Amazon Bedrock.
Cohere Rerank 3.5 is now available in Amazon Bedrock through Rerank API
We are excited to announce the availability of Cohere’s advanced reranking model Rerank 3.5 through our new Rerank API in Amazon Bedrock. This powerful reranking model enables AWS customers to significantly improve their search relevance and content ranking capabilities. In this post, we discuss the need for Reranking, the capabilities of Cohere’s Rerank 3.5, and how to get started using it on Amazon Bedrock.
AWS DeepRacer: How to master physical racing?
In this blog post, I will look at what makes physical AWS DeepRacer racing—a real car on a real track—different to racing in the virtual world—a model in a simulated 3D environment. I will cover the basics, the differences in virtual compared to physical, and what steps I have taken to get a deeper understanding of the challenge.
Easily deploy and manage hundreds of LoRA adapters with SageMaker efficient multi-adapter inference
The new efficient multi-adapter inference feature of Amazon SageMaker unlocks exciting possibilities for customers using fine-tuned models. This capability integrates with SageMaker inference components to allow you to deploy and manage hundreds of fine-tuned Low-Rank Adaptation (LoRA) adapters through SageMaker APIs. In this post, we show how to use the new efficient multi-adapter inference feature in SageMaker.