AWS Machine Learning Blog

Unleashing Stability AI’s most advanced text-to-image models for media, marketing and advertising: Revolutionizing creative workflows

To stay competitive, media, advertising, and entertainment enterprises need to stay abreast of recent dramatic technological developments. Generative AI has emerged as a game-changer, offering unprecedented opportunities for creative professionals to push boundaries and unlock new realms of possibility. At the forefront of this revolution is Stability AI’s  family of cutting-edge text-to-image AI models. These models promise to transform the way we approach visual content creation, empowering large media, advertising, and entertainment organizations to tackle real-world business use cases with efficiency and creativity.

This technical post explores how these organizations can use the power of Stability AI to streamline workflows, enhance creative processes, and unleash a new era of advertising campaigning and visual storytelling.

Overview

Amazon Bedrock recently launched three new models by Stability AI: Stable Image Ultra, Stable Diffusion 3 Large, and Stable Image Core. These advanced models greatly improve performance in multisubject prompts, image quality, and typography and can be used to rapidly generate high-quality visuals for a wide range of use cases across marketing, advertising, media, entertainment, retail, and more. One of the key improvements of these models compared to Stable Diffusion XL (SDXL) (one of Stability AI’s older models) is text quality in generated images, with fewer errors in spelling and typography thanks to its innovative Diffusion Transformer architecture.

By learning the intricate relationships between visual and textual data, these models can generate highly detailed and coherent images from simple text prompts. The improved architecture combines the strengths of various deep learning techniques, including transformer encoders for text understanding, convolutional neural networks (CNNs) for efficient image processing, and attention mechanisms for capturing long-range dependencies and fine-grained details. The new family of models available on Amazon Bedrock are mentioned in the table below:

Features Stable Image Core SD3 Large 1.0 Stable Image Ultra 1.0
Parameters 2.6 billion 8 billion 8 billion
Input Text Text or Image Text
Typography Versatility and readability across different sizes and applications Tailored for large-scale display Tailored for large-scale display
Visual Aesthetics Good rendering, not as detail oriented Highly realistic with finer attention to detail Photorealistic image output
Best Fit Fast and affordable rapid concepting and ideating Content creation in media, entertainment, retail High-quality content at speed for media, retail

To evaluate the capabilities of these models, we tested a variety of prompts ranging from simple object descriptions to complex scene compositions. The experiments revealed that, although SDXL excelled at rendering common objects and scenes accurately, these newer models from Stability AI demonstrated improved performance on more nuanced and imaginative prompts. The new models better understand and visually express abstract concepts, stylized artistic renditions, and creative blends of disparate elements.

Stable Image Core is a newer, more affordable and faster version of SDXL. It’s based on the same diffusion architecture as SDXL. In comparison, Stable Diffusion 3 Large and Stable Image Ultra are based on the new diffusion transformer architectures, making them much better at typography.

Expanded training data of the SD3 base model—which is used for both Stable Diffusion 3 Large and Stable Image Ultra—has endowed it with stronger multimodal reasoning and world knowledge compared to SDXL. Some key improvements we observed from the prompt experimentation are the following:

  1. Prompt adherence – These models excel at following complex and detailed prompts, particularly in surreal scenes, making sure that the generated images closely match the specified instructions. Stable Diffusion 3 Large and Stable Image Ultra work the best with natural language.
  2. Text Rendering: Unlike SDXL, which may struggle with incorporating text into images, these newer models effectively generate and integrate text, enhancing the overall coherence of the visuals.
  3. Complex Scene Handling: The new models demonstrate a improved ability to create intricate and detailed scenes, showcasing a better grasp of surreal elements as it understands them in your prompts.
  4. Photorealism: The images produced by these models are more lifelike, with improved handling of textures, lighting, and shadows, making them visually striking.
  5. Visual Aesthetics: The overall visual appeal is enhanced, making them more engaging and attractive.
  6. Multimodal Capabilities: The new models can process various input types beyond just text, allowing for more context-aware image generation.
  7. Scalability: The new architecture of these models supports handling larger datasets and generating higher-resolution images effectively.
  8. Advanced Architecture: The SD3 base model (used for Stable Diffusion 3 Large and Stable Image Ultra) utilizes a new diffusion transformer combined with flow matching, which enhances its performance in generating high-quality images.

The table below showcases the comparison in image generation between the models available on Amazon Bedrock.

Image Generation Comparison – Stability AI Models

Real-world use cases for media, advertising, and entertainment

In the world of media, marketing, and entertainment, concept art and storyboarding are essential for visualizing ideas and communicating creative visions. Stability AI’s models can revolutionize this process by generating high-quality concept art and storyboard frames based on textual descriptions, enabling rapid iteration and exploration of ideas.

Ideation and iteration

Advertising agencies and marketing teams can leverage these models to generate visually stunning and attention-grabbing assets for their campaigns. From product shots to lifestyle imagery, these models can produce a wide range of visuals tailored to specific brand identities and target audiences. In film and television, these models can be a powerful tool for set design and virtual production. By generating realistic environments and backdrops based on textual descriptions, production teams can quickly visualize and iterate on set designs, reducing the need for physical mockups and saving time and resources.

Character design

Character design is a crucial aspect of storytelling in media and entertainment. These models can assist artists and designers in generating unique and compelling character concepts, enabling them to explore a wide range of visual styles and aesthetics.

Social media marketing asset generation

Social media has become a vital marketing channel for media, advertising, and entertainment organizations. Stability AI’s latest models can be leveraged to generate engaging visual content, such as memes, graphics, and promotional materials, tailored to specific social media domains and target audiences.

Stability AI’s capabilities in advertising and marketing campaigns

To showcase the power of Stability AI’s text-to-image models in creating compelling advertising and marketing assets, we walk through a demonstration using a Jupyter notebook that combines large language models (LLMs) and Stable Diffusion 3 Large for end-to-end campaign creation. We demonstrate how to produce generated images for a brand called Young Generational Shoes (YGS), evaluate brand consistency and message effectiveness, use the LLM to analyze images and suggest improvements, and refine prompts based on feedback to generate new iterations. By combining LLM-generated campaign ideas with this model’s advanced image generation capabilities, agencies can rapidly produce high-quality, tailored visual assets that resonate with their target audience. The notebook provides a practical, hands-on example of how these cutting-edge AI tools can be integrated into real-world advertising workflows, potentially saving time and resources while enhancing creative output.

The recorded version of the demo is available here:

Prerequisites

This notebook is designed to run on AWS, leveraging Amazon Bedrock for both the LLM and Stability AI model access. Make sure you have the following set up before moving forward:

To access Stability AI’s Stable Image Ultra text to image model, request access through the Amazon Bedrock console. For instructions, see Manage access to Amazon Bedrock foundation models. For instructions on how to deploy this sample, refer to the GitHub repo. Use the us-west-2 Region to run this demo.

Setting up the demo

We will be using the Stable Image Ultra for the purposes of this demo. You can use one of the other available models from Stability AI on Bedrock to run through your version of the notebook.

# Amazon Bedrock Model ID used throughout this notebook
# Model IDs: https://docs.thinkwithwp.com/bedrock/latest/userguide/model-ids.html#model-ids-arns
MODEL_ID = "stability.stable-image-ultra-v1:0"

This following function call essentially acts as a wrapper around the Amazon Bedrock API, simplifying the process of generating images using Stability AI’s models. It handles the API call, response parsing, and image decoding, providing a straightforward way to generate images from text prompts using these advanced AI models.

def generate_image_from_text(model_id, body):
    """
    Generate an image using SD3 on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        image_bytes (bytes): The image generated by the model.
    """

    logger.info("Generating image with SD3 model %s", model_id)

    bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")
    
    response = bedrock.invoke_model(modelId=model_id,body=body)
    response_body= json.loads(response["body"].read())
    image_data = base64.b64decode(response_body.get("images")[0]

    logger.info("Successfully generated image with the SD3 model %s", model_id)
    return image_data

Generating creative ad campaigns with multiple models

The demo begins by using an LLM to generate creative ad campaign ideas and follows these steps

  1. Define your product or service and target audience
  2. Prompt the LLM to create multiple ad campaign concepts
  3. The LLM generates diverse ideas, considering factors such as brand identity, audience demographics, and current trends

This process allows for a wide range of creative concepts tailored to your specific marketing needs. The following is the sample prompt we used in the notebook:

You are a seasoned veteran in the advertising industry with a wealth of experience
in creating captivating and impactful campaigns. Your task is to generate five
different creative advertising concepts for our new line of shoes under the brand
"YGS". Our product range includes running shoes, soccer shoes, and training shoes.

Our target audience is the young generation, a demographic known for their energy,
trendiness, and desire to express their individuality.

Each advertising concept should seamlessly incorporate the following elements: 

1. The specific type of shoe (running, soccer, tennis, hiking or training) and 
its intended usage. 
2. A vivid description of the colors and unique features that make our
shoes stand out. 
3. A compelling scenario that vividly illustrates when and where these shoes would
be worn, capturing the essence of the active lifestyle our target audience embraces. 

Your concepts should be fresh, engaging, and resonate with the youthful spirit
of our target market. Creativity, originality, and a deep understanding of
our audience's aspirations and passions should shine through in your advertising
ideas. Remember, the goal is to craft compelling narratives that not only showcase
our product's features but also tap into the emotions and desires of the
young generation, inspiring them to embrace our brand as an extension of
their vibrant lifestyles. 

The output format should follow below Json format: 
[ { "concept": "xxx", "Description": "xxx", "Scenario": "xxx" }, 
{ "concept": "xxx", "Description": "xxx", "Scenario": "xxx" } ... ]"

Prompt engineering for visual assets

Once you have campaign concepts, the next step is to craft effective prompts for SD3 Ultra 1.0. This involves using Anthropic’s Claude Sonnet 3.5 on Amazon Bedrock to transform campaign ideas into detailed image prompts, refining these prompts to include specific visual elements, styles, and compositions, and iterating on them to make sure that they capture the essence of the campaign. This process helps create precise instructions to generate visuals that align closely with the campaign’s objectives.

 """You are an expert to use stable diffusion model to generate shoes ad posters.
 Please user below content to generate the positive and negative prompt for stable
 diffusion model:
 - "Concept": {Concept}
 - "Description": {Description}
 - "Scenario": {Scenario}
 
 Output format shoud be Json format as below:
  [
     {
        "positive_prompt": "xxx"
     }
  ]
 Please add this to the positive prompt: text \'YGS\' on the Shoes as a logo."""

Generating ad posters with Stable Image Ultra

With well-crafted prompts, Stable Image Ultra can now create stunning visual assets. The process involves entering the refined prompts into the model through the Amazon Bedrock API, adjusting parameters such as image size, number of inference steps, and guidance scale for optimal results and generating multiple variations to provide a range of options for the campaign. This approach allows for the creation of diverse, high-quality visuals that can be fine-tuned to help meet specific campaign requirements. Here are some posters generated by Stable Image Ultra:

Note:

The images generated could be different because your results depend on the parameters and their values, including the following:

  1. The cfg_scale, which determines how strictly the diffusion process adheres to the prompt text
  2. The height and width of the image in pixels
  3. The number of diffusion steps to run
  4. The random noise seed (which, if provided, makes the resulting generated image deterministic)
  5. The sampler used for the diffusion process to denoise the generation
  6. The array of text prompts used for generation
  7. The weight assigned to each prompt

These parameters allow for fine-tuning and customization of the image generation process, resulting in diverse outputs based on their specific configuration.

Clean up

To avoid charges, you must stop the active SageMaker notebook instances. For instructions, refer to Clean up Amazon Sagemaker notebook instance resources.

Conclusion

Stability AI’s new family of models represents a significant milestone in the field of generative AI, offering media, advertising, and entertainment organizations a powerful tool to streamline creative workflows and unlock new realms of visual expression. By using Stability AI’s capabilities, organizations can tackle real-world business use cases, from concept art and storyboarding to advertising campaigns and content creation. However, it’s essential to proceed with a responsible and ethical mindset, addressing potential biases, respecting intellectual property rights, and mitigating the risks of misuse. By embracing the capabilities of these models while navigating their limitations and ethical considerations, creative professionals can push the boundaries of what’s possible in the world of visual content creation. To get started, check out Stability AI models in Amazon Bedrock.

As the field of generative AI continues to evolve rapidly, we can expect even more exciting developments and innovations from Stability AI and other industry leaders. Stay tuned for further advancements that will shape the creative landscape and empower artists, designers, and content creators in unprecedented ways.


About the authors

Isha Dua is a Senior Solutions Architect based in the San Francisco Bay Area. She helps AWS enterprise customers grow by understanding their goals and challenges, and guides them on how they can architect their applications in a cloud-native manner while ensuring resilience and scalability. She’s passionate about machine learning technologies and environmental sustainability.

Boshi Huang is a Senior Applied Scientist in Generative AI at Amazon Web Services, where he collaborates with customers to develop and implement generative AI solutions. Boshi’s research focuses on advancing the field of generative AI through automatic prompt engineering, adversarial attack and defense mechanisms, inference acceleration, and developing methods for responsible and reliable visual content generation.