AWS Big Data Blog
Developer guidance on how to do local testing with Amazon MSK Serverless
Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that makes it easy to build and run Kafka clusters on Amazon Web Services (AWS). When working with Amazon MSK, developers are interested in accessing the service locally. This allows developers to test their application with a Kafka cluster that has the same configuration as production and provides an identical infrastructure to the actual environment without needing to run Kafka locally.
An Amazon MSK Serverless private DNS endpoint is only accessible from Amazon Virtual Private Cloud (Amazon VPC) connections that have been configured to connect. It isn’t directly resolvable from your local development environment. One option is to use AWS Direct Connect or AWS VPN to be able to Connect to Amazon MSK Serverless from your on-premises network. However, building such a solution may incur cost and complexity, and it needs to be set up by a platform team.
This post presents a practical approach to accessing your Amazon MSK environment for development purposes through a bastion host using a Secure Shell (SSH) tunnel (a commonly used secure connection method). Whether you’re working with Amazon MSK Serverless, where public access is unavailable, or with provisioned MSK clusters that are intentionally kept private, this post guides you through the steps to establish a secure connection and seamlessly integrate your local development environment with your MSK resources.
Solution overview
The solution allows you to directly connect to the Amazon MSK Serverless service from your local development environment without using Direct Connect or a VPN. The service is accessed with the bootstrap server DNS endpoint boot-<<xxxxxx>>.c<<x>>.kafka-serverless.<<region-name>>.amazonaws.com
on port 9098, then routed through an SSH tunnel to a bastion host, which connects to the MSK Serverless cluster. In the next step, let’s explore how to set up this connection.
The flow of the solution is as follows:
- The Kafka client sends a request to connect to the bootstrap server
- The DNS query for your MSK Serverless endpoint is routed to a locally configured DNS server
- The locally configured DNS server routes the DNS query to
localhost
. - The SSH tunnel forwards all the traffic on port 9098 from the
localhost
to the MSK Serverless server through the Amazon Elastic Compute Cloud (Amazon EC2) bastion host.
The following image shows the architecture diagram.
Prerequisites
Before deploying the solution, you need to have the following resources deployed in your account:
- An MSK Serverless cluster configured with AWS Identity and Access Management (IAM) authentication.
- A bastion host instance with network access to the MSK Serverless cluster and SSH public key authentication.
- AWS CLI configured with an IAM user and able to read and create topics on Amazon MSK. Use the IAM policy from Step 2: Create an IAM role in the Getting started using MSK Serverless clusters
- For Windows users, install Linux on Windows with Windows Subsystem for Linux 2 (WSL 2) using Ubuntu 24.04. For guidance, refer to How to install Linux on Windows with WSL.
This guide assumes an MSK Serverless deployment in us-east-1
, but it can be used in every AWS Region where MSK Serverless is available. Furthermore, we are using OS X as operating system. In the following steps replace msk-endpoint-url
with your MSK Serverless endpoint URL with IAM authentication. The MSK endpoint URL has a format like boot-<<xxxxxx>>.c<<x>>.kafka-serverless.<<region-name>>.amazonaws.com
.
Solution walkthrough
To access your Amazon MSK environment for development purposes, use the following walkthrough.
Configure local DNS server OSX
Install Dnsmasq as a local DNS server and configure the resolver to resolve the Amazon MSK. The solution uses Dnsmasq because it can compare DNS requests against a database of patterns and use these to determine the correct response. This functionality can match any request that ends in kafka-serverless.us-east-1.amazonaws.com
and send 127.0.0.1
in response. Follow these steps to install Dnsmasq:
- Update brew and install Dnsmasq using brew
- Start the Dnsmasq service
- Reroute all traffic for Serverless MSK (
kafka-serverless.us-east-1.amazonaws.com
) to127.0.0.1
- Reload Dnsmasq configuration and clear cache
Configure OS X resolver
Now that you have a working DNS server, you can configure your operating system to use it. Configure the server to send only .kafka-serverless.us-east-1.amazonaws.com
queries to Dnsmasq. Most operating systems that are similar to UNIX have a configuration file called /etc/resolv.conf
that controls the way DNS queries are performed, including the default server to use for DNS queries. Use the following steps to configure the OS X resolver:
- OS X also allows you to configure additional resolvers by creating configuration files in the
/etc/resolver/
This directory probably won’t exist on your system, so your first step should be to create it: - Create a new file with the same name as your new top-level domain (
kafka-serverless.us-east-1.amazonaws.com
) in the/etc/resolver/
directory and add127.0.0.1
as anameserver
to it by entering the following command.
Configure local DNS server Windows
In Windows Subsystem for Linux, first install Dnsmasq, then configure the resolver to resolve the Amazon MSK and finally add localhost
as the first nameserver
.
- Update apt and install Dnsmasq using apt. Install the telnet utility for later tests:
- Reroute all traffic for Serverless MSK (
kafka-serverless.us-east-1.amazonaws.com
) to127.0.0.1
. - Reload Dnsmasq configuration and clear cache.
- Open
/etc/resolv.conf
and add the following code in the first line.
Create SSH tunnel
The next step is to create the SSH tunnel, which will allow any connections made to localhost:9098 on your local machine to be forwarded over the SSH tunnel to the target Kafka broker. Use the following steps to create the SSH tunnel:
- Replace
bastion-host-dns-endpoint
with the public DNS endpoint of the bastion host, which comes in the style of<<xyz>>.compute-1.amazonaws.com
, and replaceec2-key-pair.pem
with the key pair of the bastion host. Then create the SSH tunnel by entering the following command. - Leave the SSH tunnel running and open a new terminal window.
- Test the connection to the Amazon MSK server by entering the following command.
Testing
Now configure the Kafka client to use IAM Authentication and then test the setup. You find the latest Kafka installation at the Apache Kafka Download site. Then unzip and copy the content of the Dafka folder into ~/kafka
.
- Download the IAM authentication and unpack it
- Configure Kafka properties to use IAM as the authentication mechanism
- Enter the following command in
~/kafka/bin
to create an example topic. Make sure that the SSH tunnel created in the previous section is still open and running.
Cleanup
To remove the solution, complete the following steps for Mac users:
- Delete the file
/etc/resolver/kafka-serverless.us-east-1.amazonaws.com
- Delete the entry
address=/kafka-serverless.us-east-1.amazonaws.com/127.0.0.1 in the file $(brew --prefix)/etc/dnsmasq.conf
- Stop the Dnsmasq service
sudo brew services stop dnsmasq
- Remove the Dnsmasq service
sudo brew uninstall dnsmasq
To remove the solution, complete the following steps for WSL users:
- Delete the file
/etc/dnsmasq.conf
- Delete the entry
nameserver 127.0.0.1
in the file/etc/resolv.conf
- Remove the Dnsmasq service
sudo apt remove dnsmasq
- Remove the telnet utility
sudo apt remove telnet
Conclusion
In this post, I presented you with guidance on how developers can connect to Amazon MSK Serverless from local environments. The connection is done using an Amazon MSK endpoint through an SSH tunnel and a bastion host. This enables developers to experiment and test locally, without needing to setup a separate Kafka cluster.
About the Author
Simon Peyer is a Solutions Architect at Amazon Web Services (AWS) based in Switzerland. He is a practical doer and passionate about connecting technology and people using AWS Cloud services. A special focus for him is data streaming and automations. Besides work, Simon enjoys his family, the outdoors, and hiking in the mountains.