AWS Contact Center
Performing a tabletop exercise with Amazon Connect
When a contact center experiences an unexpected service disruption, the impact can be immediate and severe: agents unable to access systems, customers facing connection issues, and support teams working to rapidly restore service. While such scenarios may seem extreme, they represent exactly the kind of situations that make tabletop exercises essential for modern contact center operations.
In this blog post, we will walk you through a few tabletop exercises with Amazon Connect, the AI-powered cloud contact center solution by Amazon Web Services (AWS). Afterward, you should understand what and how to check different scenarios with your contact center migration that will help you update your processes and procedures to get ready for the unexpected.
An introduction to the tabletop exercise
Tabletop exercises are critical for companies of any size as they provide a structured and controlled environment to simulate real-world scenarios, allowing organizations to test their response plans for crises such as cyberattacks, natural disasters, or system outages. These exercises involve key stakeholders, such as senior executives, department heads, and crisis management teams, in strategic discussions and decision-making processes. By running through potential threats and disruptions, enterprises can identify gaps in their emergency procedures, improve communication and coordination, and ensure that all teams understand their roles during an actual event.
Tabletop exercises also help organizations refine their risk management strategies, increase organizational resilience, and build confidence in their ability to respond swiftly and effectively when faced with unexpected challenges, ultimately safeguarding both operations and reputation.
1) Amazon Connect service interruption
Scenario: Amazon Connect experiences a temporary service outage affecting your contact center’s ability to make or receive calls for a specific region. Customers are unable to reach support agents, and agents cannot access the system to handle calls.
Key objectives:
- Verify whether the issue is isolated to your Amazon Connect instance or affects multiple regions
- Check AWS PHD for any service issues
- Use CloudWatch metrics to investigate potential causes (e.g., connectivity issues, API limits)
- Determine how to notify end-users and stakeholders about the outage and the estimated recovery time
- Implement a workaround, such as redirecting calls to an alternative system or using a different region
2) Poor quality or latency
Scenario: A batch of customer calls experiences high latency and degraded audio quality, leading to poor customer experiences. This issue appears to affect calls between specific agents and customers in certain regions.
Key objectives:
- Investigate network performance using Amazon Connect voice metrics and Amazon CloudWatch
- Check the status of any third-party system involved (eg. SIP Trunking or telephony integrations)
- Identify whether the issue is related to regional infra or network congestion
- Test voice quality using Amazon Connect Voice Analytics tool and other diagnostic tools like CloudWatch logs
- When contacting AWS Support, prepare a HAR file/console logs to assist in Support Troubleshooting (see: https://repost.aws/knowledge-center/support-case-browser-har-file)
3) Contact flow misconfiguration
Scenario: A newly updated contact flow is preventing customers from being routed to the correct department. Instead of reaching the intended agent, customers are disconnected or placed in an incorrect queue.
Key objectives:
- Check CloudWatch metrics for “ContactFlowErrors” and “ContactFlowFatalErrors“
- Identify an impacted Contact and review the Contact Flow Log in Cloudwatch for more information on the error
- Walk through the contact flow to identify misconfiguration (e.g., incorrect conditions, wrong queue routing)
- Implement a rollback plan or hot fix to restore functionality if a configuration change caused the issue
- Test and validate the contact flow after changes to ensure the issue is resolved
- Document and communicate the change to the team and end-users
5) AWS Lambda function timeout
Scenario: A Lambda function used for custom integrations (e.g., fetching customer data from a database) is timing out and causing delays in call handling, resulting in dropped calls or delayed information retrieval.
Key Objectives:
- Review the Lambda function logs and identify why the timeout is happening (e.g., slow database queries, function logic errors)
- Test the Lambda function manually outside of Amazon Connect to check for performance issues
- Modify the Lambda timeout settings and adjust the function code to improve performance
- Investigate if the Lambda function is hitting AWS service limits or encountering resource constraints
6) Issues with Customer Relationship Management (CRM) integration (eg. Salesforce, ServiceNow)
Scenario: Calls initiated from Amazon Connect are failing to populate customer data into the CRM, leading to agents being unable to access necessary information when assisting customers.
Key Objectives:
- Investigate if the integration API between Amazon Connect and the CRM system is functional
- Check for any recent updates or changes to the CRM or Amazon Connect integration that might have caused the issue
- Verify the authentication and authorization settings between the two systems (e.g., API keys, IAM roles)
- Test a manual connection between Amazon Connect and the CRM and review logs for any errors
7) Routing profile misconfiguration
Scenario: Agents are incorrectly assigned to routing profiles, and as a result, they are receiving calls from the wrong queues, or they are not receiving any calls at all.
Key objectives:
- Review the routing profiles and make sure they are assigned to the correct agents
- Ensure that queue memberships and skills are properly configured to route calls appropriately
- Test the routing logic with different agents to verify that calls are routed correctly
- Adjust any discrepancies in the configuration and monitor the system for improvements
8) High call abandonment rate
Scenario: There is an unusually high rate of abandoned calls in the contact center. Customers are hanging up before reaching an agent or while waiting in queue.
Key objectives:
- Monitor the queue lengths and wait times using Amazon Connect metrics to see if they exceed acceptable thresholds
- Review contact flow configurations and verify that estimated wait times are communicated clearly to customers
- Check if there is a bottleneck in the agent availability or if the system is overwhelmed
- Adjust the IVR or routing strategy to reduce wait times and provide more accurate waiting time estimations
9) Authentication failure for agents
Scenario: Agents are unable to log in to the Amazon Connect instance due to authentication errors, possibly caused by a misconfiguration in the identity provider (e.g., AWS SSO, Active Directory).
Key Objectives:
- Verify that the integration between Amazon Connect and your identity provider is still valid
- Check for recent changes to IAM roles, policies, or security settings that might affect agent login
- Review CloudWatch logs for any authentication or permission errors
- Test agent logins using different accounts to isolate whether the issue is account-specific or system-wide
10) Customer data inconsistencies
Scenario: Customer data stored in Amazon Connect or integrated systems (e.g., AWS Lambda, Amazon DynamoDB) is inconsistent across different customer interactions. For example, previous case notes or customer preferences are not appearing for agents.
Key objectives:
- Verify that your data integration processes (e.g., Lambda, DynamoDB, Salesforce) are functioning correctly
- Ensure that customer data is being correctly retrieved, stored, and updated in real-time during calls
- Investigate any data synchronization issues that may be causing inconsistencies
- Implement a fix to ensure consistent data retrieval for agents, and review logging and error handling for future data discrepancies
11) Multi-channel experience degradation
Scenario: The multi-channel experience for customers (voice, chat, and email) is degraded, and customers are experiencing inconsistent interactions across different channels.
Key objectives:
- Review the integration between Amazon Connect and the other channels
- Verify that contact flows are correctly set up for each channel and that customer data is shared across channels (e.g., passing context from voice to chat or vice versa)
- Monitor CloudWatch metrics for all channels to identify if certain channels are underperforming
- Test each channel (voice, chat, email) individually to confirm the nature of the degradation
- Implement cross-channel escalation options (e.g., transferring a chat to a voice call) and ensure that customer context is preserved across channels
Conclusion
In this blog post, we walked you through 11 diverse tabletop exercise scenarios for Amazon Connect to provide you with a comprehensive approach to troubleshooting and resolving common issues that may arise in a contact center environment. By walking you through the detailed steps to address each challenge, we hope you have gained valuable insights into effectively managing Amazon Connect capabilities.
These exercises not only reinforce problem-solving techniques but also emphasize the importance of proactive planning, testing, and ongoing monitoring to maintain seamless operations. Whether you’re contact center manager or technical expert, practicing these scenarios will help build the resilience and expertise necessary to tackle real-world issues confidently and efficiently.
Remember, continuous improvement through exercises like these ensures that your Amazon Connect environment remains robust and ready to provide exceptional customer experiences.