AWS for Industries

Distributed inference with collaborative AI agents for Telco-powered Smart-X

Introduction

Artificial Intelligence (AI) applications are evolving into distributed inference pipelines, where models run across multiple tiers, from device edge to AWS, to optimize latency, bandwidth, and privacy. Instead of sending all raw data to AWS, collaborative AI inference can occur at the device, far edge, near edge (often a 5G Multi-access Edge Computing site – MEC), and Amazon Web Services (AWS) Region, with Telcos playing a crucial role as AI compute hubs (Figure 1). Through the lens of AI and agentic workflows, the edge is no longer just a passive data relay but an intelligent decision-making layer, where AI agents dynamically orchestrate tasks, collaborate across tiers, and optimize inference execution in real-time. Telco-powered Smart X refers to smart infrastructure (smart cities, smart transportation, etc.) powered by AI and connected by telco networks. In this post, we explore the art of the possible: how edge and AWS cloud can seamlessly coexist to tackle real-world challenges by distributing intelligence across a multi-tiered architecture. We take a new look at the “intersection safety” problem to show how this hybrid approach enables faster, more efficient solutions. The architectures we discuss here are broadly applicable across various Smart X domains.

Figure 1 Spectrum of distributed computing locations and their applications for Smart-XFigure 1: Spectrum of distributed computing locations and their applications for Smart-X

Architecture and technology stack

Building a distributed AI pipeline necessitates an integrated networking architecture from the far edge (devices, sensors, gateways) through the near edge (telco edge/MEC), and to the AWS Region. Each layer has distinct responsibilities and uses specific AWS technologies.

Networking layer: The networking infrastructure forms the critical backbone of our distributed inference architecture, connecting the far edge, near edge, and AWS cloud tiers with reliable, secure, and low-latency communication pathways. At the far edge, devices connect to the 5G network via dedicated slices that guarantee quality of service for time-sensitive AI workloads. These network slices provide isolated, virtualized portions of the 5G infrastructure with configurable parameters for bandwidth, latency, and reliability. The 5G PCIe modules integrated with the AI hardware enable ultra-reliable low-latency communication (URLLC) with sub-10ms response times, essential for safety-critical applications like intersection monitoring. Between the far edge and near edge, private network Access Point Names (APNs) establish secure tunnels that shield sensitive video data from public internet exposure, while AWS Outposts Service Link creates private connections between the operator’s near edge deployments (AWS Outpost family) and AWS Regional services. This connectivity fabric supports both control plane messages (low-bandwidth device management and model updates) and data plane traffic (higher-bandwidth video streams and inference results) through optimized routing paths. AWS IoT Core provides the messaging backbone with MQTT over WebSockets for persistent, bidirectional connections that maintain session state even during brief connectivity lapses. The entire network path implements end-to-end encryption with mutual TLS authentication and certificate-based identity management through AWS IoT Device Defender, making sure that only authorized devices can participate in the inference pipeline. Furthermore, network telemetry is continuously monitored through Amazon CloudWatch and AWS IoT SiteWise, allowing for real-time optimization of traffic patterns and proactive identification of potential bottlenecks that could impact inference latency or reliability. This multi-layered networking approach creates a resilient foundation that makes sure that smart intersection safety systems remain operational even under challenging conditions, such as network congestion during major traffic incidents or natural disasters.

Figure 2 Functional architecture showing networking and technology stackFigure 2: Functional architecture showing networking and technology stack

Far edge layer: This is where data is generated, for example IP cameras or smart sensors at an intersection. These devices leverage onboard AI capabilities using a heterogeneous architecture of CPU and GPU hardware, enabling multi-sensor processing across audio, video, and image understanding. By embracing a shared infrastructure with vendor plurality, this approach ensures scalability, flexibility, and interoperability across different hardware platforms. On-board vision language models (VLMs) run directly on video feeds to interpret the scene in real time. This means the camera can detect vehicles, pedestrians, or anomalies within milliseconds on-site. The far edge device also runs a reflex agent to perform simple reasoning or command-and-control tasks without AWS connectivity. These models are optimized for edge deployment (through techniques such as quantization and model distillation) so they can operate within the device’s compute/memory constraints. This post doesn’t aim to solve the broader problem of computer vision. We recognize that collision detection is inherently challenging due to the dynamic nature of objects, occlusions, and the limitations of 2D vision-based inference.

Near edge (telco edge/MEC) layer: The service provider’s edge (often a 5G Multi-access Edge Computing (MEC) site) acts as an intermediate tier. This near edge layer consists of AWS compute nodes deployed on-premises, such as AWS Outposts family. In our architecture, the near edge nodes take on heavier AI tasks that are too intensive for the far edge layer, such as aggregating feeds from multiple cameras to get a city-wide view, or running advanced multimodal models. AWS services are extended to the telco edge, with an AWS IoT Greengrass component deployed on an edge server to orchestrate local inferencing and caching of data. The AWS IoT Greengrass framework allows for the deploying and running small language models (SLMs) and AWS Lambda functions directly on edge hardware, so logic can execute with minimal latency at the MEC site. The near edge serves as a collaboration point. It receives alerts or data from multiple far edge hardware, performs validation (a secondary collision detection with a larger VLM model), and forwards only the necessary, enriched information to the central AWS Region.

AWS region: AWS Region is the central brain and long-term repository of the system. It hosts advanced AI services and coordination logic to perform post-perception analytics and decision-making. This includes stream processing for real-time alerts, batch processing for model fine-tuning, and exposes REST APIs to integrate with external systems (dashboards and emergency services). The AWS Region is responsible for primary orchestration through Amazon Bedrock and Agents for Bedrock. Amazon Bedrock provides access to large foundation models from AWS such as Amazon Nova, and leading proprietary and open source models, without needing to manage infrastructure. On top of Amazon Bedrock, the Agents feature allows us to create autonomous AI agents that can plan and execute tasks by invoking different models and APIs. These agents use an LLM to interpret goals and call out to tools or other models to achieve those goals. In our design, an Amazon Bedrock Agent on AWS serves as a coordinator AI (primary orchestrator) that reasons over the combined inputs coming from the far edge and near edge. The agent queries more context by calling external APIs or AWS services. Then, it decides on high-level actions such as triggering alerts or updating models.

Intersection safety use case implementation

Intersections remain one of the most hazardous locations on roadways. Despite decades of research and pilot programs, intersection safety remains a pressing challenge due to reactive/slow interventions, limited sensing and monitoring, sparse deployment of roadside units (RSUs), funding hurdles, and technical limitations of legacy systems. Although the building blocks for safer intersections exist, these challenges have stalled widespread implementation. AWS and TCS have partnered to reimagine the “intersection safety” problem, demonstrating how a hybrid approach uses existing distributed infrastructures and AI agents for faster, more efficient solutions.

To make this concrete, we walk through how distributed inference with collaborative AI agents can power an intersection safety system: a real-world Smart City/Transportation use case. Consider a busy urban intersection instrumented with cameras and connected infrastructure, all cooperating to prevent and respond to collisions. We use a sample 30-second video depicting a left-turn collision (Figure 3).

Figure 3 Left-turn collision at a high-visibility intersectionFigure 3: Left-turn collision at a high-visibility intersection

Edge processing begins with RSUs, where cameras collect real-time traffic data. RSUs serve as an infrastructure component that integrate multiple sensing and computing capabilities, including IP cameras, LiDAR, radar, and V2X communication modules, to enhance situational awareness. CPU/GPU powered local inference detects critical events such as collisions. Upon detection, the onboard reflex agent triggers an action, sending the incident data through MQTT to the autonomous agent orchestrator in the AWS Region and the semi-autonomous agent at the near edge. Then, the processed video data of the incident is compressed (H.265 encoding) and transmitted to the near edge (Mobile Core Data Center) for further analysis. At the near edge, the 5G Control Plane and User Plane Function (UPF) make sure of low-latency connectivity between RSUs and AWS services. Semi-autonomous agents running on AWS Outposts process incoming metadata and enhance event detection through a sequence of tasks planned by a more capable SLM. The onboard VLM further refines local decision-making, reducing false positives and improving accuracy of detection. Metadata is stored in a local database for compliance. When more context is needed, the agent triggers the primary agent orchestrator in the AWS Region. Here, Amazon Bedrock Agents and Large Language Models (LLMs) receive enriched metadata with broader context for traffic incidents. Amazon Kinesis Video Streams manages real-time video ingestion, storing anonymized footage in Amazon S3 for deeper analysis. AWS IoT Core and AWS IoT SiteWise enable fleet management, monitoring, and device orchestration, while private endpoints make sure of secure, low-latency connectivity between the near edge and AWS Region.

End-to-end distributed inference architecture Figure 4: End-to-end distributed inference architecture for intelligent intersection safety

In this section we observe the distributed inference logic, and the multi-agent collaboration workflow for this use case:

1. Far edge inference: The camera streams video to the RSU, where VLM inference detects an incident—such as a collision, inoperable vehicle, illegal parking, debris, or potholes—and generates incident data as JSON, along with a stored image/video capture. The data is then sent to the on-board reflex agent for immediate processing while also being transmitted to the telco edge and AWS Region for further analysis and response coordination.

a. Simple reflex agent: The incident data activates a simple reflex agent driven by an on-device SLM for reasoning. This is a static agent workflow that follows an “if-condition, then-action” logic. It scans the incoming context (incident data), compares it against a fixed guideline (detection accuracy > 60%), and runs a predetermined action (emergency dispatch notification). Although not yet implemented, it could broadcast alerts directly to nearby connected cars through V2X or manage intersection traffic lights.

2. Near edge inference: The telco edge receives the incoming image and video feed along with the incident notification, aggregating data from multiple cameras for a city-wide perspective. It uses a larger VLM to validate the notification’s accuracy and reported severity of the collision. It consults nearby intersection devices to detect road conditions, traffic patterns, and emerging incidents, initiating real-time monitoring.

a. Semi-autonomous agent: The telco edge features a more advanced SLM/LLM that powers the reasoning of the semi-autonomous agent, incorporating human-in-the-loop verification for enhanced decision-making. This is a goal-based reasoning agent, using memory for decision-making. It can handle complex scenarios by aggregating insights from multiple far-edge devices and forwarding only the necessary, enriched information to the AWS Region. Its goal-seeking nature enables it to infer anticipated road closures, congestion levels, and traffic signal states at nearby intersections.

3. AWS Region inference: AWS Region enables centralized control, coordination, and scalability by connecting edge devices to AWS services, creating a powerful hybrid architecture. Aggregated data in the AWS Region provides intersection safety maps, on-demand video access, and alert summaries. Moreover, it integrates with existing city traffic management systems. It also contains coordination logic for post-perception analytics and informed decision-making.

a. Autonomous agent: The AWS Region-based autonomous agent, powered by Amazon Bedrock, serves as the primary orchestrator. It uses learning agents to optimize decisions by evaluating trade-offs for the best possible outcomes. Driven by a LLM in the AWS Region, the agent continuously learns from interactions to enhance performance over time. It plans and executes tasks by invoking various models and APIs, reasoning over combined inputs from both far and near-edge devices. It queries external APIs and AWS services for contextual insights, enabling high-level actions such as triggering emergency alerts, estimating incident clearance times, and issuing public notifications.

Figure 5 Agentic workflow for intersection safetyFigure 5: Agentic workflow for intersection safety

Interoperability among LLM-powered agents across different frameworks can be achieved through agent protocols that create a standardized communication layer, enabling context-sharing between agents. It allows them to access and process both structured and unstructured data, interact with external services, and coordinate decision-making efficiently for an adaptable and collaborative AI ecosystem. Running autonomous agents presents infrastructure challenges due to their long-running, bursty, and sometimes unreliable nature. These agents require scalable compute resources to handle dynamic workloads and maintain reliability. AWS addresses these challenges by offering on-demand scaling, fault tolerance, and distributed processing, making it an ideal environment for running autonomous agents. By leveraging AWS infrastructure, distributed inference architectures can efficiently manage workloads, ensuring continuous operation and adaptive resource allocation for real-time decision-making. Agentic AI communication can occur through APIs, event-driven systems, direct messaging, federated learning, knowledge graphs, blackboards, and blockchain. The optimal approach depends on the agent’s role, real-time processing requirements, and security considerations.

The reimagined intersection safety use case showcases a pipeline of distributed AI inference: far edge computing for immediate perception and instant physical actions, telco networks carrying concise alerts, and AI agents in the AWS Region orchestrating a multi-faceted response. This kind of system effectively creates an AI-powered 24/7 intersection guardian that can anticipate, prevent, and respond to collisions. The use of AWS IoT Core and Greengrass makes sure that the edge and AWS components stay synchronized and can communicate even under intermittent connectivity, while Amazon Bedrock Agents bring advanced reasoning to what was traditionally a direct sensor network. This fusion of on-site awareness and AWS Cloud intelligence can dramatically reduce emergency response times and improve traffic flow after incidents, ultimately saving lives.

Future outlook

Intersections can become an extension of the network, with telcos offering integrated services (connectivity and edge compute) to support them. They provide the connective tissue and distributed compute environment that an AI intersection system needs. They bring ultra-low-latency links (through 5G/fiber/Satellite) and edge computing nodes (MEC) to enable real-time, city-wide deployment. Offering managed services and network APIs allows telcos to accelerate adoption. In turn, cities won’t need to build or integrate these complex networks themselves, because they can subscribe to a service.

Deploying edge AI at intersections demands robust, secure, and weatherproof infrastructure on the ground that can be managed by the cities. Existing enclosures (digital cabinets) can serve as the protective housings that make it feasible to put sophisticated computing on a street corner. They make sure that environmental factors or vandals don’t undermine the safety mission. Addressing physical and cyber security from the start (weather-sealed enclosures, backup power, tamper alarms, secure access controls), cities and telcos can confidently roll out AI hardware across many intersections with minimal downtime or intrusion risk.

AWS connectivity supercharges the capabilities of edge-based intersection safety systems. By tying each intersection into a AWS control plane, we achieve unified management, collective learning (each intersection contributes to improving the whole system), and integration with other services. The AWS infrastructure – from AWS Outposts Family, AWS Local Zones, AWS Wavelength, to IoT and AI services for management – provides the toolkit to build this connected architecture. The result is a resilient, scalable network of smart intersections that can be managed as easily as a fleet of IoT devices, with millisecond response at the street level and AWS cloud intelligence at the city level.

The future of distributed inference also lies in simplifying development for engineers by providing software abstractions that make complex, multi-node AI infrastructures feel local. By reducing development and migration complexity, developers can build and deploy autonomous agents without worrying about the underlying hardware. This approach enables heterogeneous computing across CPU, GPU, and NPU, avoiding vendor lock-in while ensuring faster time-to-market and broader adoption. Standardized APIs and AWS cloud-native frameworks will further drive scalability and interoperability, making AI deployment more accessible and efficient.

Conclusion

Distributed inference with collaborative AI agents is a powerful architectural pattern enabling next-gen applications in smart cities, smart industries – essentially every “Smart X” scenario where local actions and global intelligence must come together. Using AWS, edge services and telco networks allows developers and architects to build systems that are fast, intelligent, and scalable. The key is distributing the right AI capabilities to the right place. As we move forward, architects should design for distribution: push critical inference to the edge for latency, use AWS cloud for aggregated learning, and employ agents to tie it all together. This can unlock the full potential of Smart X solutions, delivering not only better technical performance but also tangible improvements to safety, efficiency, and quality of life in our connected world. The future of AI is distributed, collaborative, and powered by the synergy of edge and AWS cloud – and it’s an exciting journey just ahead of us.

TCS — AWS Partner spotlight

“Enterprises require the adaptability of a true AI ecosystem which provides seamless integration between devices, edge AI infrastructure, telco datacenters and cloud. AI agents at the edge that autonomously process data, collaborate with other AI agents, models, and enterprise applications will enable a host of real time business applications. Adoption of distributed inferencing will also pave way for efficient, scalable data processing at optimized cost and performance” – Sujatha Gopal, CTO – Communications Media & Information Services Business Group, TCS.

TCS is an AWS Advanced Technology Partner and AWS Competency Partner that provides IT services, consulting, and business solutions, headquartered in India with operations in 55 countries, known for providing digital transformation and technology services to enterprises across various industries.

Additional resources

Subhash Talluri

Subhash Talluri

Subhash Talluri is a Lead AI/ML solutions architect of the Telecom Industry business unit at AWS. He’s been leading development of innovative AI/ML solutions for Telecom customers and partners worldwide. He brings interdisciplinary expertise in engineering and computer science to help build scalable, secure, and compliant AI/ML solutions through cloud-optimized architectures on AWS.

Ajay Rane

Ajay Rane

Ajay Rane is currently the Head of WW IoT Business Development at Amazon Web Services. His team works with leading Communication Service Providers to transform the IoT business and unlock growth. Ajay is a customer obsessed veteran of the telecommunications and semiconductor industries with a track record of delivering value through innovation. Previously, Ajay managed IoT ecosystem partners as VP of Business Development at Sigfox and led product management for LTE chipset as VP of Marketing and Business Development at MBit Wireless. Ajay started his career at Intel where, over a span of 11 years, he held positions of increasing responsibility within the flash memory, servers, WiMAX, and Wi-Fi businesses.

Awaiz Khan

Awaiz Khan

Awaiz Ahmad Khan is a wireless technology leader with 15+ years of experience in RAN architecture, private 5G, and spectrum innovation across UMTS, LTE, LTE-A, CBRS, and 5G. He has played a key role in developing private 5G solutions for enterprises and industrial applications, leveraging CBRS and shared spectrum models. Awaiz was an active contributor to Win Forum, shaping spectrum-sharing frameworks and interoperability standards. His work in CI/CD automation, AI-driven RAN optimization, and cloud-native deployments has earned him multiple industry awards and patents in wireless technologies. He holds Master and Bachelors degrees in Electrical Engineering.

Jad Naim

Jad Naim

Jad Naim has 18+ year of telecommunication experience from building large scale WAN, to designing and implementing OSS and planning platforms for Mobile operators across the globe. At AWS, Jad has been focused on deploying private 5G networks, as well as roaming gateways for local breakout. Jad focused on IoT and how we can infuse intelligence at the edge to provide insights to increase safety and prevent injuries.

Ravi Devarasetti

Ravi Devarasetti

Ravi Devarasetti is Consulting Partner, Leading Emerging Technology, and Innovation charter for CTO Office, of Communications, Media & Information Services Business Group in TCS. Ravi has over 25 years of experience in Communications and IT sector, has worked on Architecture, Consulting, and E2E Solution engagements for several CPSs and MSOs across US and Europe. Ravi engages with customers in early stages of technology exploitation and incubation, through innovative design thinking during pilot execution, feasibility, and business case preparation. He is good at turning innovative technologies to successful businesses, building ecosystem and partnerships. He is a member of Open Group and Chair's AEA Colorado Chapter. Beyond emerging technology enthusiast, he loves travel and exploring new places.