NVIDIA CUDA-Q 0.12：加速量子应用开发与硬件设计

NVIDIA CUDA-Q 0.12 introduces new simulation tools for accelerating how researchers develop quantum applications and design performant quantum hardware.

With the new run API, users can obtain more detailed statistics on individual runs (or shots) of a simulation, rather than being restricted to aggregated statistical outputs from simulations. Access to raw shot data is important to researchers for a variety of use cases such as analyzing noise correlation between qubits, result postselection, precise circuit benchmarking, and more.

The 0.12 release also includes additional features for the CUDA-Q dynamics backend, which enables users to simulate the evolution of quantum systems. This is an important capability for modeling and improving quantum hardware. This release adds better multidiagonal sparse matrix support and batching of states and operators that allow users to scale dynamics techniques. CUDA-Q Dynamics also now supports generic super-operators equations, providing researchers with more flexibility.

CUDA-Q is an open source project, and this release includes community contributions from the unitaryHACK event, as well as Python 3.13 support. This post explains some of these new features in detail. For more detailed information, see the CUDA-Q 0.12 release notes.

Enabling more expressive applications

CUDA-Q is built from the ground up to support writing hybrid quantum-classical applications, using a kernel programming model to orchestrate QPUs, GPUs, and CPUs. Logic to run on a quantum device is encapsulated in quantum kernels. There are multiple ways to execute a kernel. One way is with the sample API that returns aggregated statistics of measurements counts of the qubits in the kernel.

For example, for a kernel that takes three qubits and applies the GHZ state to them, calling sample with this kernel, and specifying 1,000 shots will return the aggregated statistics of what measurement outcomes are observed over those 1,000 shots: { 000:492 111:508 }. As expected for a GHZ state, outcomes of 000 and 111 are observed with roughly equal probability. However, it’s not possible to learn anything more detailed about each shot.

import cudaq@cudaq.kerneldef simple_ghz(num_qubits: int) -> int:    qubits = cudaq.qvector(num_qubits)    # Create GHZ state    h(qubits[0])    for i in range(1, num_qubits):        x.ctrl(qubits[0], qubits[i])    result = 0    for i in range(num_qubits):        if mz(qubits[i]):            result += 1    return resultshots = 20  # using small number of shots for simplicitysample_results = cudaq.sample(simple_ghz, 3, shots_count=shots)print(f"Sample results: {sample_results}")run_results = cudaq.run(simple_ghz, 3, shots_count=shots)print(f"Run results: {run_results}")

$ python3 test.pySample results: { 000:11 111:9 }Run results: [0, 3, 0, 0, 0, 0, 3, 0, 0, 3, 3, 3, 3, 3, 0, 3, 0, 3, 3, 3]

Unlike the sample API, the run API preserves individual return values from each shot, which is useful when the application needs to analyze the distribution of returned results. With run, kernels can be more expressive and have conditional measurements of specific qubits. The return value of these kernels will be explicit and can contain multiple data types, including custom data types using Python data classes.

In addition, run has an asynchronous version, run_async, useful for long-running executions. Currently, run and run_async are supported for simulation backends only. For more information and code examples, see the CUDA-Q documentation.

Achieve better performance for dynamics simulation

The CUDA-Q dynamics backend enabled the design, simulation, and execution of quantum dynamics systems. The 0.12 release adds multiple enhancements to this backend.

Previously, system dynamics was limited to the Lindblad master equation, specified by the Hamiltonian operator and collapse operators. Now users can simulate any arbitrary state evolution equation, specifying the evolution as a generic super-operator. A super-operator can be constructed as a linear combination of left and/or right multiplication actions of operator instances.

Updated support was also added for multidiagonal sparse matrices. Depending on the sparsity of the operator matrix or the subsystem dimension, CUDA-Q will automatically use the dense or multidiagonal data formats for optimal performance.

The CUDA-Q evolve API can evolve multiple initial states and multiple Hamiltonians over time. With the 0.12 release, both states and Hamiltonians can be batched on multiple GPUs. This can significantly improve the performance of simulating many small identical system dynamics for the purpose of parameter sweeping or tomography. Collapse operators and super-operators can be batched in a similar manner.

For example, a dynamics simulation of an electrically driven silicon spin qubit involves a parameter sweep of amplitude values and creating a Hamiltonian for each amplitude value. Without batching, this will result in multiple calls to evolve, one for each amplitude value. With batching, users can create the following Hamiltonian batch with 1,024 different parameter values:

# Sweep the amplitudeamplitudes = np.linspace(0.0, 0.5, 1024)# Construct a list of Hamiltonian operators for each amplitude so that we can # batch them all togetherbatched_hamiltonian = []for amplitude in amplitudes:    # Electric dipole spin resonance (`EDSR`) Hamiltonian    H = 0.5 * resonance_frequency * spin.z(0) + amplitude * ScalarOperator(        lambda t: 0.5 * np.sin(resonance_frequency * t)) * spin.x(0)    # Append the Hamiltonian to the batched list    # This allows us to compute the dynamics for all amplitudes in a single     # simulation run    batched_hamiltonian.append(H)

And then use it in one call to evolve:

results = cudaq.evolve(  batched_hamiltonian,  dimensions,  schedule,  psi0,  observables=[boson.number(0)],  collapse_operators=[],  store_intermediate_results=cudaq.IntermediateResultSave.EXPECTATION_VALUE,  integrator=ScipyZvodeIntegrator())

Running this example on an NVIDIA H100 GPU with different batch sizes yields the results shown in Figure 1 for different parameter values. The more Hamiltonians batched, the lower the overall runtime. Batching all 1,024 Hamiltonians in one evolve call results in an 18x speedup over no batching.

*Figure 1. A simulation of a silicon spin qubit on a NVIDIA H100 GPU with 1,024 different Hamiltonians*

For more details including code examples, see the CUDA-Q documentation.

unitaryHACK is an open source quantum computing stack hackathon, organized by Unitary Foundation, a nonprofit supporting the quantum computing community with open source projects, microgrants, and community events. As a recent event sponsor, NVIDIA submitted five CUDA-Q bounties, leading to the following three community contributions in CUDA-Q 0.12:

Gopal-Dahale

Multi-Particle Entanglement of Hot Trapped Ions

ACE07-Sev

Approximate State Preparation Using MPS Sequential Encoding

Randl

CUDA-Q is an open source project that accepts community contributions year-round. To learn more, visit NVIDIA/cuda-quantum on GitHub.

Get started with CUDA-Q

Visit CUDA-Q Quick Start to learn more and get started. Explore CUDA-Q applications and dynamics examples and engage with the team on the NVIDIA/cuda-quantum GitHub repo. To learn more about other tools for enabling accelerated quantum supercomputing, check out NVIDIA Quantum.

Enabling more expressive applications

Achieve better performance for dynamics simulation

Get started with CUDA-Q

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签