Simulating complex systems with LLM-driven agents: leveraging AWS ParallelCluster for scalable AI experiments

Simulating complex systems with LLM-driven agents - leveraging AWS ParallelCluster for scalable AI experiments.png In this blog post, we present an experimental simulation where large language models (LLMs) power the core decision-making mechanisms of individual agents within an energy supply chain model. Unlike traditional agent-based models (ABMs) that rely on predetermined rules, our approach leverages LLMs—specifically Llama 3 models via Ollama—to enable more nuanced and dynamic agent behaviors.

This methodology allows us to explore how AI-driven agents might respond to market pressures, regulatory changes, and even emergent collective behaviors like corporate culture or environmental consciousness.

Using LLMs for agents offers several key advantages over traditional modeling approaches. While mathematically accurate agents can be effective for many scenarios, they often struggle to capture the complexity of human decision-making, especially when it comes to intangible aspects like emotions or personal values. LLMs, on the other hand, can draw upon their vast pre-training to simulate more realistic and adaptable behaviors without requiring explicit programming for every possible scenario.

Furthermore, LLMs can help overcome the limitations of traditional machine-learning models, which may not adapt well to new situations or require extensive retraining. By leveraging the generative capabilities of LLMs, our agents can respond dynamically to novel scenarios, making decisions based on a broader understanding of context and nuance. This approach also simplifies the development process, as it eliminates the need for complex decision trees or exhaustive logic programming that traditional agent-based models often require.

This is an experiment

It’s important to note that this work is not intended to provide predictive models for energy companies.

Rather, it serves as a proof of concept to demonstrate the potential of integrating generative AI agents into simulations. Our goal is to inspire new thinking about how advanced AI technologies can enhance our understanding of complex systems and decision-making processes – particularly in situations where human factors play a significant role.

Throughout this blog post, we’ll discuss the technical challenges we encountered, the solutions we developed, how we leveraged AWS ParallelCluster to scale our simulations, and we’ll share insights gained from our experiments.

We’re excited to share this exploratory work with the community. The code used in this project is available in our GitHub repository, enabling others to build upon and extend this concept of generative AI agents in simulations across diverse domains. By combining the flexibility of LLMs with the structural framework of ABMs, we hope to push the boundaries of what’s possible in modeling complex, human-centric systems.

Conceptual model design

To explore the potential of LLM-powered agents in simulations, we designed a simplified energy supply chain model. This model represents a basic ecosystem of energy producers and utilities, each with their own decision-making processes driven by LLMs.

Here’s an overview of the key components:

Agents:

Energy producers: These agents generate energy and determine their production levels, facility upgrades, and pricing strategies.
Utilities: These agents purchase energy from producers and sell it to consumers.

Agent properties:

Initial resources: Starting capital for each agent
Maximum production capabilities: Upper limit on energy production for producers
Production costs: Expenses associated with energy generation

LLM-driven decision processes:

Energy producers:
- Determine energy production levels and selling prices
- Decide whether to upgrade facilities to increase production limits (considering the associated costs)
Utilities:
- Decide energy purchase amounts and consumer pricing to meet consumer demand
- Randomly assigned one of three personas: [environmentally conscious, greedy, depressed]

Agent Objective:

The primary goal for all LLM-powered agents is to maximize profit, though their decision-making processes may be influenced by their assigned personas.

This setup allows us to observe how LLM-driven agents interact within a simplified market environment, making complex decisions based on their objectives, constraints, and (for utilities) their assigned personas.

Figure 1 – ABM node attributes, actions, and state

Our model features dynamic communication between agents in the simulated energy market. Each agent maintains a state encompassing its current attributes like resources, production levels, and pricing strategies. This state is shared with agents directly connected in the network graph. At each simulation time step, agents update and broadcast their states to their “neighbors” – agents linked to them in the graph. This ensures each agent has current information about its immediate environment. For instance, energy producers are aware of connected utilities’ pricing strategies, while utilities know their suppliers’ production capabilities.

This continuous state sharing creates an interconnected system where agents make decisions based on localized, current conditions. The result is a complex interplay of agent behaviors and decision-making, allowing us to observe emergent phenomena arising from these interactions in our simplified market model.

Integration with AWS

Our agent-based modeling infrastructure combines AWS technologies and open-source tools in a Python environment:

LLM inference:
- Ollama for quick loading and running of LLM models on GPUs
- Llama3.2-3B model selected for balance of capability and speed
- Ollama chosen for its capabilities that align with our simulation requirements:
  1. Supports numerous simultaneous inference calls
  2. Handles high-volume, rapid inference efficiently

Distributed computing:
- Ray cluster implemented within AWS ParallelCluster
  1. Orchestrates multiple LLM agents at scale
  2. Leverages multi-node capabilities

Elastic compute:
- Automatically scales up/down based on simulation needs
- Terminates upon task completion
Parallel simulations:
- Runs multiple simulations concurrently on AWS ParallelCluster
- Facilitates comprehensive statistical data collection and effects hypothesis testing

This architecture enables efficient, scalable LLM inference and distributed computing for complex agent-based simulations, surpassing traditional methods in capability and flexibility. The use of Ollama, in particular, allows us to meet the unique demands of our simulation environment where high-volume, rapid LLM inference is crucial.

Optimizing LLM selection and performance

The integration of LLMs into ABMs required extensive experimentation with various foundation models, including different versions of Llama and Claude with varying numbers of weights. Each model presented unique challenges, necessitating a balance between inference speed, model intelligence, and output consistency.

Initial tests with large-scale models exceeding 400 billion parameters demonstrated impressive capabilities but proved impractical for running ABMs at scale due to their computational demands. These models required multiple GPUs due to their high VRAM demands and took 5-20 seconds to respond, depending on the prompts. Consequently, regardless of cost, the ABM simulation time would have been longer than desired for experimentation. Conversely, models with less than 1 billion parameters struggled to maintain focus and produce consistently formatted outputs, although they responded in less than one second using a single GPU.

Claude Haiku, available through Amazon Bedrock, offered a blend of inference speed and intelligence, requiring minimal post-processing. When requested to provide JSON with specific data types, Claude cleanly followed these instructions. However, the high-throughput nature of the simulation made us explore alternatives that could better accommodate specific scaling needs.

This exploration led to the Llama family of models. Older Llama models (e.g., version 2) or the smaller Llama with 1B parameters consistently ignored requests for JSON output or added appreciable explanations of their reasoning despite requests not to do so. Llama 3.2-3B emerged as a strong candidate, capable of staying on task and effectively assuming agent roles. With an inference time of approximately 1-2 seconds using a single GPU on a G5 instance from Amazon Elastic Compute Cloud (Amazon EC2), this model required only minor post-processing, making it well-suited for high-volume inference requirements.

Simulation results and insights

With our LLM selection optimized and our simulation framework in place, we were ready to explore the behavior of AI-driven agents in our energy market model. Our research progressed through a series of experiments, each designed to investigate different aspects of market dynamics and agent behavior. Let’s examine these experiments, starting with our initial exploration of basic profit maximization.

Experiment 1: profit maximization in an unregulated market

Our first simulation tested LLMs as profit-maximizing agents in a simplified energy market with 20 companies, as shown in Figure 2. Figure 3 illustrates the results: a rapid escalation of electricity prices to consumers. In this unconstrained environment with inelastic consumer demand, where consumers continue to purchase electricity regardless of price changes, AI agents quickly drove prices to extreme levels.

The LLM-driven agents successfully pursued profit maximization strategies, leading to market behavior reminiscent of real-world scenarios where lack of regulation has led to price spikes. While our simplified model doesn’t capture all the complexities of real energy markets, it shows how LLM-driven agents can produce emergent behaviors in even basic simulations.

This experiment highlights the potential of integrating LLMs into agent-based models. By allowing for more nuanced decision-making processes, these AI-driven agents can generate complex market dynamics that might be challenging to replicate with traditional rule-based approaches.

These initial results encouraged us to explore more complex scenarios in subsequent iterations, introducing additional constraints and market factors to further test the capabilities of our LLM-driven simulation approach.

Figure 2 – Network diagram of randomly generated simulated energy market showing 20 interconnected companies, including utility providers and various types of energy producers.

Figure 3 – Example of price gouging for electricity to consumers

Experiment 2: regulated market with competitive dynamics

In our second experiment, we introduced regulatory measures, including price caps for consumers, to address market instabilities observed in the unregulated scenario. We aimed to test if LLM-driven agents could produce natural supply and demand dynamics within these constraints, particularly focusing on energy-producing agents’ price adjustments to remain competitive.

Figure 4 illustrates the results of this simulation, revealing several intriguing dynamics:

Feedback mechanism: The LLMs demonstrated an ability to simulate market feedback mechanisms effectively. When agents raised their prices too high, they would subsequently make gradual downward adjustments to remain competitive. This behavior is visible in the price trends shown in Figure 4, where we see periods of price increases followed by gradual decreases as agents respond to market conditions.
Price adjustment strategies: When an agent increased its price and consequently lost sales, it would gradually decrease its price to regain market share. This behavior mimics real-world market strategies.
Unexpected inflationary pressure: Interestingly, we observed that the lowest-bidding producer would recognize its advantageous position and incrementally increase its price. This led to a subtle but persistent inflationary pressure, causing a slow increase in overall market prices over time.
Indirect competition effects: A particularly noteworthy observation involved Utility 3, which sourced all its energy from a single producer (Energy Producer 6). Despite this apparent lack of direct competition, Utility 3 still benefited from market forces. This was because Energy Producer 6 was competing to supply other utilities, indirectly influencing its pricing for Utility 3.

This experiment demonstrated LLM-driven agents’ ability to produce complex, realistic market behaviors in a simulated regulated environment, capturing nuanced dynamics such as emergent inflationary pressures and indirect competition effects.

Figure 4 – Energy producer price over time demonstrating competitive feedback and inflationary like increases

Experiment 3: scaling to larger agent populations

Our experiment focused on scaling up the number of agents in our LLM-driven simulation to achieve statistically significant results and observe potential emergent behaviors. We incrementally increased the population from 20 to 800 agents, analyzing the effects on system behavior and computational requirements.

Our experiments suggest that simulating a complex energy market for something like a large U.S. state might require several hundred agents. For example, a state-level simulation could involve 5-10 major utility companies, 50-100 smaller utilities, 20-30 large energy producers, and 100-200 smaller producers and renewable facilities, totaling between 175-340 agents. This scale would provide a more realistic representation of the diverse entities in a large state’s energy ecosystem.

The model arbitrarily labels energy producers (nuclear, coal, etc.) without differentiating properties, serving as subpopulations that should be statistically indistinguishable. We aimed to determine the agent count necessary to overcome potential biases in randomly generated supply chains and reduce variation in subpopulation averages for effective hypothesis testing.

Figure 6 shows the scaling results:

20 agents: Highly chaotic error bands, we are first looking for stable variance, then looking for error band overlap, this test is inconclusive
50 agents: More stable but still unpredictable
100 agents: First signs of stable variance, overlapping error bands
200-800 agents: Minor changes, increasingly steady behavior
400 agents: Sufficient for statistically significant results

As we scaled up our simulations, we found that the primary bottleneck was inference speed. Figure 5 shows that while 20-agent simulations ran efficiently on 2-4 GPUs, the 400-agent simulation required approximately 60+ GPUs to maintain performance. Network latency prevented the iteration time from going below ~ 9 seconds.

Figure 5 – Scaling the ABM where iteration time is time required to complete a discrete simulation timestep

This experiment highlights the importance of population size in LLM-driven agent-based modeling and the trade-offs between model complexity, population size, and computational resources. It provides valuable insights for researchers developing large-scale, LLM-driven simulations, emphasizing the need to balance result stability with computational requirements.

Figure 6 – Reviewing the effects on subpopulations with changes in total agent population size. Shaded area is 95% confidence interval.

Experiment 4: incorporating emotional aspects into agent behavior

Building on our previous experiments, we sought to explore how incorporating intangible, behavioral factors into our LLM-driven agents might affect the overall system behavior. This experiment aimed to simulate more nuanced decision-making processes that go beyond pure profit maximization, potentially mimicking real-world scenarios where personal biases and emotional states influence business decisions.

In this experiment, we modified the prompts for the LLM agents include one of three distinct personas:

Environmentally conscious
Greedy
Depressed

Our objective was to observe whether these traits would lead to statistically significant macroscopic trends. Rather than attempting to mathematically define these characteristics, we leveraged the LLMs’ ability to interpret and act upon these more abstract concepts.

Figure 7 presents two graphs for comparison. The top graph serves as our control group, where no personas were implemented in the prompts. As expected, the uncertainty bands are too wide to show any meaningful difference between the groups. The bottom graph, however, incorporates the personas and reveals a clear shift in behavior. Notably, the greedy persona consistently achieves higher profits compared to the other personas.

Figure 7 – Top: no personas in prompts, very large uncertainty bands, no statistically significant differences. Bottom: personas in LLM prompts, clear statistically significant difference in greedy persona relative to other personas

To further explore the impact of these behavioral traits, we examined whether environmental consciousness led to a preference for renewable energy sources despite potential price differences. Figure 8 displays a weight the LLM uses to preferentially select renewable energy, while Figure 9 shows the fraction of renewable energy purchased, weighted by the agent’s preference.

Figure 8 Preference metric that increases likelihood of a utility purchasing energy from a renewable source.

Surprisingly, we observed no discernible difference between greedy and environmentally conscious agents in their energy source choices. This suggests that in our simulated scenario, the profit maximization goal overrode the environmental considerations, even for the environmentally conscious agents. We verified this finding by increasing the number of agents to 800, but the results remained consistent.

Figure 9 Top: no personas, random variation Bottom: with personas, the uncertainty is too larger to determine if any effect exists

It’s important to note that while these results are intriguing, they are based on simulated behaviors and should not be extrapolated to real-world scenarios without careful consideration and validation. The primary value of this experiment lies in demonstrating the potential for incorporating more complex, human-like traits into LLM-driven simulations.

This experiment shows the ability for LLM-driven agents to simulate nuanced decision-making processes that go beyond simple rule-based behaviors. By incorporating emotional aspects, we’ve opened up new possibilities for exploring how intangible factors might influence system-level outcomes in complex environments.

We hope that this experimental approach inspires researchers and practitioners to explore similar techniques in their own domains, potentially leading to new insights into how emotional and psychological factors might impact decision-making processes in various systems.

AWS architecture

We implemented a scalable cloud-based architecture using AWS services for our LLM-driven agent-based simulations. The setup, detailed in our GitHub repository, centers around AWS ParallelCluster with a SLURM job scheduler and head node/compute node architecture.

We used the ParallelCluster UI for rapid HPC cluster deployment and configured the head node with Python, Ray, and some other essential packages. For persistent, scalable storage across the cluster, we integrated Amazon Elastic File System (Amazon EFS).

We automated our Ray cluster deployment using SSH scripts, and we installed the Ollama framework for efficient LLM inference on GPUs. Figure 10 illustrates our distributed GPU-node Ray cluster architecture.

This setup enables dynamic scaling from small experiments to large-scale simulations with hundreds of LLM agents. The combination of AWS ParallelCluster and Ray provides flexibility, resource optimization, and cost management. This architecture serves as a robust foundation for future LLM and distributed computing research, offering a template for similar large-scale AI-driven simulations.

Figure 10 – Architecture for a Ray cluster using distributed GPU nodes

Summary

This blog post explores an innovative approach to ABMs in energy supply chains by integrating “small” LLMs to capture complex agent behaviors. Our experiments, conducted using AWS ParallelCluster with Ray and Ollama, demonstrate the potential of LLMs in simulating nuanced decision-making processes in a competitive energy market.

This work is experimental by nature – we’re really aiming to inspire researchers and data scientists to explore LLM-driven simulations in their own fields. The flexibility and scalability of AWS services provide a powerful foundation for running complex, AI-driven experiments across various domains.

We invite readers to explore our GitHub repository for implementation details and encourage the community to build upon this concept of generative AI agents in simulations, potentially leading to new insights and methodologies in diverse fields of study.

If you want to request a proof of concept or if you have feedback on the AWS tools, please reach out to us at ask-hpc@amazon.com.

AWS HPC Blog