AWS Quantum Technologies Blog

Advancing hybrid quantum computing research with Amazon Braket and NVIDIA CUDA-Q

All quantum computing is hybrid. From simple pre- and post-processing of results to complex operations on the qubit lifetime, in all known applications of quantum computing, classical computers work in tandem with quantum computers.

In this post, you will learn how AWS and NVIDIA are teaming up to bring NVIDIA’s open-source quantum development environment, CUDA-Q Platform, to Amazon Braket to address emergent questions about the role of classical computing in the quantum stack.

Background

Today, customers make use of classical computing at every stage of the quantum research journey, from the design and testing of algorithms on local and managed simulators to co-processing for running iterative variational algorithms such as quantum machine learning. However, as quantum computers become more performant and capable of executing complex algorithms, the role of classical, accelerated computing will become increasingly pronounced. For example, in the algorithm design phase of the journey, customers will need state-of-the-art simulators running on powerful GPUs for quickly and efficiently testing larger circuits prior to running them on relatively expensive, and often scarce, quantum hardware resources.

To optimize for performance with the limitations of today’s quantum processors , researchers are actively experimenting with classical pre- and post-processing subroutines such as circuit cutting and error-mitigation. As qubit counts and gate depths increase, the demands on the classical compute resources needed to perform these subroutines increases in tandem, and often exponentially.

Even when a commercially useful quantum computer becomes available, customers will not simply be able to just “lift and shift” their algorithms over from today’s existing classical computers to quantum computers. As a new computing paradigm, certain aspects of the programming experience will be radically different and developer tooling and frameworks for designing and executing hybrid quantum-classical algorithms are actively being developed. Moreover, the quantum error correction needed for large-scale useful quantum computers relies fundamentally on classical co-processing.

Advancing the state of the art

Customers need one-stop-shop access to all these capabilities to achieve their quantum computing goals and advance the state of the art.

Today, we’ll discuss how AWS is working with NVIDIA to address these emerging needs in the industry. Customers can now develop hybrid workflows using the open-source NVIDIA CUDA-Q Platform directly within their Braket developer environment. They can then test their programs using simulators that run on powerful GPUs within Amazon Braket Hybrid Jobs. As a result, the native CUDA-Q simulator backend running within Braket Hybrid Jobs can offer significantly faster runtime performance than leading open-source simulators.

Best of all, customers can now execute their CUDA-Q programs on all the quantum hardware backends supported by Braket, like the gate-based processors from IonQ, IQM, and Rigetti, and the analog quantum hardware from QuEra, simply by changing a single line of code.

But the integration of CUDA-Q with Braket is just the start. As quantum computing technologies mature, state-of-the-art workloads will have diverse and increasingly demanding requirements for the associated classical compute resources: from ultra-low latency co-processing for quantum error correction (QEC) decoding and feedback and supercomputing-scale classical pre- and post-processing to quantum hardware control and calibration, and AI-enabled quantum simulations. To address these challenges, AWS is working with NVIDIA to evaluate the latency and compute requirements of future workloads, as well as develop a quantum computing stack that gives customers the performance and flexibility needed to get the most out of emerging quantum computing technologies.

Figure. 1 shows a high-level depiction of the integration of CUDA-Q with Amazon Braket, which we’ll discuss in more detail next.

Figure 1: NVIDIA CUDA-Q developers now have the option to leverage CPUs and NVIDIA GPUs available via Braket Hybrid Jobs to develop and execute NVIDIA CUDA-Q programs on simulators without worrying about managing the underlying classical compute. Customers can also seamlessly run these programs on all quantum hardware supported on Braket either on demand or with dedicated access to the quantum device via a reservation. 

Figure 1: NVIDIA CUDA-Q developers now have the option to leverage CPUs and NVIDIA GPUs available via Braket Hybrid Jobs to develop and execute NVIDIA CUDA-Q programs on simulators without worrying about managing the underlying classical compute. Customers can also seamlessly run these programs on all quantum hardware supported on Braket either on demand or with dedicated access to the quantum device via a reservation.

Amazon Braket and NVIDIA CUDA-Q

The primary goal of the Braket service is to lower customers’ technology risk by providing access to a diverse set of quantum hardware via a consistent user experience and pay-as-you-go pricing. Integral to this mission is to provide researchers access to a broad and deep set of capabilities like support for multiple quantum programming tools – now including CUDA-Q.

To optimize their run time performance, the Braket Hybrid Jobs provides priority access to quantum hardware, coupled with fully-managed access to CPUs and GPUs for classical co-processing. Furthermore, since inception, Braket has been fully integrated within the AWS experience, using the same familiar tools for security, identity, and access management, that millions of customers are using to store data and run production classical workloads today.

NVIDIA CUDA-Q is designed to help enable the vision of accelerated quantum supercomputers — hybrid infrastructures that integrate quantum hardware within AI supercomputers. CUDA-Q is built from the ground up to be qubit-agnostic and accelerate quantum-classical applications and control systems. Using it in combination with Braket, developers can get closer than ever to useful, accelerated quantum supercomputing.

Push the boundaries of your simulations with CUDA-Q on Amazon Braket

To understand how quantum hardware improvements will increase the demands on classical compute resources, let’s take a look at an example. In Figure 2, we demonstrate the runtime of an algorithm executed on the native CUDA-Q simulator across a wide range of qubit counts, on CPU-based (ml.c5.18xlarge) and GPU-based (ml.p3.16xlarge) instances, available using Braket Hybrid Jobs.

Note that as the number of qubits increases, the time to simulate the algorithm increases exponentially, and the runtime becomes increasingly prohibitive on a single CPU for large qubit counts. Already at only 21 qubits, the GPU instance provides a 350X speedup in runtime over the CPU. Giving researchers fully managed access to diverse CPU and GPU instance types, they can focus more of their time on designing new algorithms, and less on managing infrastructure and optimizing their simulations.

Figure 2. Execution time of NVIDIA CUDA-Q’s simulator software running on GPU and CPU instance types. The random circuits used for the tests have the same definition as in this blog post. Each term in the Hamiltonian is a tensor product of randomly selected Pauli observables on each qubit. The parameters for the benchmark are: n_gates=100, n_terms=100, n_shots=500. Each data point is an average of 5 runs. The GPU simulations were executed on a p3.16xlarge instance. The CPU simulations were executed on a ml.c5.18xlarge instance.

Figure 2. Execution time of NVIDIA CUDA-Q’s simulator software running on GPU and CPU instance types. The random circuits used for the tests have the same definition as in this blog post. Each term in the Hamiltonian is a tensor product of randomly selected Pauli observables on each qubit. The parameters for the benchmark are: n_gates=100, n_terms=100, n_shots=500. Each data point is an average of 5 runs. The GPU simulations were executed on a p3.16xlarge instance. The CPU simulations were executed on a ml.c5.18xlarge instance.

Next, let’s see how the CUDA-Q GPU-based simulator backend compares to other leading open-source simulators available in the industry today. In Figure 3, we show the runtime for a 29-qubit algorithm running on the CUDA-Q GPU simulator compared with other popular quantum programming frameworks.

All these runs were conducted on a single ml.p3.16.xlarge instance. The results showed that the CUDA-Q simulator backend running within Braket Hybrid Jobs ran significantly faster than other open-source simulators in our tests.

Figure 3. Execution duration for three GPU-based simulators with results shown relative to the NVIDIA CUDA-Q simulator. The random circuits used are the same as those used for the results shown in Figure 2. The parameters for the benchmark are: n_qubits=29, n_gates=100, n_terms=100, n_shots=500. Each data point is an average of 5 runs. All data for GPU simulators compared here (NVIDIA CUDA-Q, Cirq Qsim, Qiskit Aer simulators) are collected from runs using ml.p3.16xlarge instances. This test uses the default interface of a simulator to evaluate the Hamiltonians with multiple terms. If a simulator does not have such an interface, the test iterates through the terms in the Hamiltonian.

Figure 3. Execution duration for three GPU-based simulators with results shown relative to the NVIDIA CUDA-Q simulator. The random circuits used are the same as those used for the results shown in Figure 2. The parameters for the benchmark are: n_qubits=29, n_gates=100, n_terms=100, n_shots=500. Each data point is an average of 5 runs. All data for GPU simulators compared here (NVIDIA CUDA-Q, Cirq Qsim, Qiskit Aer simulators) are collected from runs using ml.p3.16xlarge instances. This test uses the default interface of a simulator to evaluate the Hamiltonians with multiple terms. If a simulator does not have such an interface, the test iterates through the terms in the Hamiltonian.

In addition to running on a single GPU, certain algorithms can benefit further from parallelization across multiple instances. Braket Hybrid Jobs and CUDA-Q streamline the process to distribute circuit sampling and observable evaluations across multiple computational nodes. As we show in the next section, by changing just one line of code in an algorithm script, along with a minor configuration change, we can switch execution from single-GPU execution to parallel execution across multiple GPUs and nodes. For a workload with 30-qubit circuits, we obtained a speedup of about 6.5x when parallelizing the evaluation of 100 observables on a single circuit across 8 GPUs. When parallelizing 128 different circuits across 8 GPUs, the speedup was 11x for unrelated circuits and 17x for parametric circuits.

Today, our customers routinely run algorithms that take advantage of all the qubits available on the quantum hardware on Braket, such as the 20-qubit IQM Garnet QPU or the 36-qubit IonQ Forte processor. But, as quantum hardware improves over the next few years to support hundreds of higher-fidelity qubits, customers will need to run larger-scale simulations to test their algorithms prior to committing to running them on quantum processors, saving both time and money. Now, with the integration of CUDA-Q, customers can use the most powerful simulation tools from NVIDIA along with elastic, scalable CPU and GPU capacity from AWS, all using the Braket Hybrid Jobs interface.

Getting hands-on

Now, having described the performance benefits, let’s get hands-on with running CUDA-Q within Amazon Braket Hybrid Jobs. We’ve put all the code here in an example notebook.

If you are a first time AWS user, you need to create an IAM user. If you are an existing AWS customer, skip to the next section.

To get started, you first need to set up your AWS Account and Identity Access Management (IAM) credentials such as IAM users that enables you to access AWS services via the AWS console, command line interface (CLI) or software development kit (SDK). Upon creation of your IAM user, you can generate an Access key via the IAM Access Management Console and import those credentials into the following code snippet below.

You can set these using aws configure in the AWS CLI or set the following environment variables:

export AWS_DEFAULT_REGION="us-east-1"
export AWS_ACCESS_KEY_ID="<key_id>"
export AWS_SECRET_ACCESS_KEY="<access_key>"
export AWS_SESSION_TOKEN="<token>"

Getting started with NVIDIA CUDA-Q on Amazon Braket

First, in your Braket console or on your local machine, spin up a Jupyter notebook. If you are a first-time Braket user, you can check out our Getting Started documentation to ensure you have the appropriate permissions enabled.

To simplify the installation of CUDA-Q, run the following command to build a CUDA-Q Braket container image and upload the image to an Amazon Elastic Container Registry repository and region that you specify.

! container/container_build_and_push.sh braket-cudaq-byoc-job us-west-2

Note that your IAM role will need to have a policy which enables you to access the ECR repository such as theAmazonEC2ContainerRegistryFullAccess policy.

It will take a few minutes to build the container, depending on your network speed. Once the image is built, you can create hybrid jobs using CUDA-Q.

The code snippet below is an example hybrid job script. Inside the hello_quantum function, a CUDA-Q backend is initialized, followed by the definition of a Bell circuit. The circuit is then sampled for 1000 shots. The measurement probabilities from the result are returned from the hybrid job for further analysis.

image_uri = "<ecr-image-uri>"

@hybrid_job(device='local:nvidia/qpp-cpu', image_uri=image_uri)
def hello_quantum():
    import cudaq

    # define the backend
    device=get_job_device_arn()
    cudaq.set_target(device.split('/')[-1])

    # define the Bell circuit
    kernel = cudaq.make_kernel()
    qubits = kernel.qalloc(2)
    kernel.h(qubits[0])
    kernel.cx(qubits[0], qubits[1])

    # sample the Bell circuit
    result = cudaq.sample(kernel, shots_count=1000)
    measurement_probabilities = dict(result.items())
    
    return measurement_probabilities

If you now want to scale this up to run on a remote instance hosted on the cloud such as a GPU-instance on AWS, all you need to do is make two changes to the inputs of the hybrid_job decorator above. First, change the device argument to local:nvidia/nvidia . You can look at the CUDA-Q documentation for more detail about the backend.

Second, add an instance_config argument and set an instance type that contains one or more GPUs. For example:

instance_config=InstanceConfig(instanceType='ml.p3.2xlarge')

In this example, we use an ml.p3.2xlarge instance to access an NVIDIA V100 Tensor Core GPU, though instances with more powerful GPUs are also available.

Behind the scenes, Braket will spin up a cloud-hosted GPU instance, pull your container from ECR into that host and run your code. We designed Braket to shut down the instance once your code is finished running, so you only pay for what you use. For more information about the supported instance types in Braket Hybrid Jobs, see the Braket Developer Guide. For examples of running parallel workloads with CUDA-Q on Braket Hybrid Jobs, you can check out the example notebooks in our GitHub repo.

Running CUDA-Q programs on Braket-managed quantum hardware

Having tested your CUDA-Q based algorithm on simulators, now let’s learn how to execute quantum circuits on the quantum computers available in Braket. You can use CUDA-Q with all quantum hardware available in Braket.

Today, Braket sends customer circuits to quantum hardware located on third-party premises for processing. Navigate to the Braket console in your AWS console and select Permissions and Settings. Click on Enable Third-party Devices to enable quantum hardware access via Braket and set up the necessary Execution Roles. Refer to our documentation for more details.

Now you‘re ready to use quantum hardware on Braket as a CUDA-Q backend. All you need to do is set the CUDA-Q target to braket and pass the device ARN as the machine parameter. In the code snippet below, we run the same Bell circuit built using CUDA-Q on a superconducting quantum computer from IQM. Note that once your hybrid job has started, circuits that run as part of a hybrid job receive higher-priority access to the target Braket QPUs, which helps reduce the run-time of your experiments, which in turn helps minimize the impact of hardware drift to your algorithms.

device_arn = "arn:aws:braket:eu-north-1::device/qpu/iqm/Garnet"

@hybrid_job(device=device_arn, image_uri=image_uri)
def job_with_braket_device():
    import cudaq
    
    # define the backend
    device=get_job_device_arn()
    cudaq.set_target("braket", machine=device)
    
    # define the Bell circuit
    kernel = cudaq.make_kernel()
    qubits = kernel.qalloc(2)
    kernel.h(qubits[0])
    kernel.cx(qubits[0], qubits[1])
    kernel.mz(qubits)

    # sample the Bell circuit
    result = cudaq.sample(kernel, shots_count=1000)
    measurement_probabilities = dict(result.items())
    
    return measurement_probabilities

Conclusion

Our goal is to lower the barriers for researchers to explore and advance research and development in quantum computing.

As quantum computers improve, they will increasingly require more powerful classical compute resources across the entire stack. With this release, we are combining NVIDIA’s hybrid quantum-classical CUDA-Q Platform, and its powerful simulation capabilities, with the benefit of fully managed, pay-as-you-go access to classical and quantum resources available through Amazon Braket.

Now, researchers can focus their time on building new algorithms rather than managing compute infrastructure or negotiating with quantum hardware providers. This release is a first step to a broader collaboration between AWS and NVIDIA to explore the quantum stack as quantum and classical become inextricably linked over time.

To get started, check out our the example notebooks.