Nvidia Developer 09月03日
NVIDIA CUDA-Q 0.12:加速量子应用开发与硬件设计
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

NVIDIA CUDA-Q 0.12版本引入了强大的新模拟工具,旨在加速研究人员开发量子应用程序和设计高性能量子硬件。新版本通过引入run API,允许用户获取单个模拟运行(shots)的详细统计数据,摆脱了对聚合统计输出的限制,这对于分析噪声相关性、结果后选择和电路基准测试至关重要。此外,CUDA-Q动态后端也得到了增强,支持更广泛的量子系统演化模拟,包括改进的多对角稀疏矩阵支持和状态/算符批处理,从而提升了模拟的性能和灵活性。该版本还整合了来自unitaryHACK社区的贡献,并增加了对Python 3.13的支持,进一步推动了CUDA-Q的开源发展。

📊 **增强的模拟粒度与数据访问**:CUDA-Q 0.12通过引入`run` API,使用户能够获取量子模拟中单个运行(shots)的详细统计数据,而非仅仅依赖于聚合结果。这种对原始shot数据的访问对于深入分析量子比特间的噪声相关性、进行更精确的结果后选择以及进行精细的电路性能基准测试至关重要,为研究提供了前所未有的灵活性和洞察力。

🚀 **提升量子动力学模拟性能**:新版本显著增强了CUDA-Q的动态后端,使其能够模拟更广泛的量子系统演化方程,并支持任意状态演化。通过改进的多对角稀疏矩阵支持以及对状态和算符的批处理能力,用户可以更高效地进行大规模量子系统模拟,例如在参数扫描或量子态层析成像中,显著加速了研究进程。

🤝 **社区驱动的开源贡献与扩展**:CUDA-Q 0.12集成了来自unitaryHACK等社区活动的关键贡献,包括用于模拟量子比特动力学的示例代码、近似状态制备的教程以及获取量子核矩阵的初步实现。同时,支持Python 3.13也进一步扩展了其生态系统,体现了CUDA-Q作为开源项目的活力与社区的协同效应。

💡 **混合量子-经典应用开发**:CUDA-Q从底层设计支持编写混合量子-经典应用程序,利用其核编程模型来编排QPUs、GPUs和CPUs的协同工作。量子核将运行在量子设备上的逻辑封装起来,通过`sample` API提供聚合统计信息,而`run` API则允许更精细的数据处理,使得应用程序能够更具表现力,并支持条件测量和多种数据类型返回。

NVIDIA CUDA-Q 0.12 introduces new simulation tools for accelerating how researchers develop quantum applications and design performant quantum hardware.

With the new run API, users can obtain more detailed statistics on individual runs (or shots) of a simulation, rather than being restricted to aggregated statistical outputs from simulations. Access to raw shot data is important to researchers for a variety of use cases such as analyzing noise correlation between qubits, result postselection, precise circuit benchmarking, and more. 

The 0.12 release also includes additional features for the CUDA-Q dynamics backend, which enables users to simulate the evolution of quantum systems. This is an important capability for modeling and improving quantum hardware. This release adds  better multidiagonal sparse matrix support and batching of states and operators that allow users to scale dynamics techniques. CUDA-Q Dynamics also now supports generic super-operators equations, providing researchers with more flexibility. 

CUDA-Q is an open source project, and this release includes community contributions from the unitaryHACK event, as well as Python 3.13 support. This post explains some of these new features in detail. For more detailed information, see the CUDA-Q 0.12 release notes.

Enabling more expressive applications 

CUDA-Q is built from the ground up to support writing hybrid quantum-classical applications, using a kernel programming model to orchestrate QPUs, GPUs, and CPUs. Logic to run on a quantum device is encapsulated in quantum kernels. There are multiple ways to execute a kernel. One way is with the sample API that returns aggregated statistics of measurements counts of the qubits in the kernel. 

For example, for a kernel that takes three qubits and applies the GHZ state to them, calling sample with this kernel, and specifying 1,000 shots will return the aggregated statistics of what measurement outcomes are observed over those 1,000 shots: { 000:492 111:508 }. As expected for a GHZ state, outcomes of 000 and 111 are observed with roughly equal probability. However, it’s not possible to learn anything more detailed about each shot.

import cudaq@cudaq.kerneldef simple_ghz(num_qubits: int) -> int:    qubits = cudaq.qvector(num_qubits)    # Create GHZ state    h(qubits[0])    for i in range(1, num_qubits):        x.ctrl(qubits[0], qubits[i])    result = 0    for i in range(num_qubits):        if mz(qubits[i]):            result += 1    return resultshots = 20  # using small number of shots for simplicitysample_results = cudaq.sample(simple_ghz, 3, shots_count=shots)print(f"Sample results: {sample_results}")run_results = cudaq.run(simple_ghz, 3, shots_count=shots)print(f"Run results: {run_results}")
$ python3 test.pySample results: { 000:11 111:9 }Run results: [0, 3, 0, 0, 0, 0, 3, 0, 0, 3, 3, 3, 3, 3, 0, 3, 0, 3, 3, 3]

Unlike the sample API, the run API preserves individual return values from each shot, which is useful when the application needs to analyze the distribution of returned results. With run, kernels can be more expressive and have conditional measurements of specific qubits. The return value of these kernels will be explicit and can contain multiple data types, including custom data types using Python data classes.

In addition, run has an asynchronous version, run_async, useful for long-running executions. Currently, run and run_async are supported for simulation backends only. For more information and code examples, see the CUDA-Q documentation.

Achieve better performance for dynamics simulation

The CUDA-Q dynamics backend enabled the design, simulation, and execution of quantum dynamics systems. The 0.12 release adds multiple enhancements to this backend.

Previously, system dynamics was limited to the Lindblad master equation, specified by the Hamiltonian operator and collapse operators. Now users can simulate any arbitrary state evolution equation, specifying the evolution as a generic super-operator. A super-operator can be constructed as a linear combination of left and/or right multiplication actions of operator instances.

Updated support was also added for multidiagonal sparse matrices. Depending on the sparsity of the operator matrix or the subsystem dimension, CUDA-Q will automatically use the dense or multidiagonal data formats for optimal performance.

The CUDA-Q evolve API can evolve multiple initial states and multiple Hamiltonians over time. With the 0.12 release, both states and Hamiltonians can be batched on multiple GPUs. This can significantly improve the performance of simulating many small identical system dynamics for the purpose of parameter sweeping or tomography. Collapse operators and super-operators can be batched in a similar manner. 

For example, a dynamics simulation of an electrically driven silicon spin qubit involves a parameter sweep of amplitude values and creating a Hamiltonian for each amplitude value. Without batching, this will result in multiple calls to evolve, one for each amplitude value. With batching, users can create the following Hamiltonian batch with 1,024 different parameter values:

# Sweep the amplitudeamplitudes = np.linspace(0.0, 0.5, 1024)# Construct a list of Hamiltonian operators for each amplitude so that we can # batch them all togetherbatched_hamiltonian = []for amplitude in amplitudes:    # Electric dipole spin resonance (`EDSR`) Hamiltonian    H = 0.5 * resonance_frequency * spin.z(0) + amplitude * ScalarOperator(        lambda t: 0.5 * np.sin(resonance_frequency * t)) * spin.x(0)    # Append the Hamiltonian to the batched list    # This allows us to compute the dynamics for all amplitudes in a single     # simulation run    batched_hamiltonian.append(H)

And then use it in one call to evolve:

results = cudaq.evolve(  batched_hamiltonian,  dimensions,  schedule,  psi0,  observables=[boson.number(0)],  collapse_operators=[],  store_intermediate_results=cudaq.IntermediateResultSave.EXPECTATION_VALUE,  integrator=ScipyZvodeIntegrator())

Running this example on an NVIDIA H100 GPU with different batch sizes yields the results shown in Figure 1 for different parameter values. The more Hamiltonians batched, the lower the overall runtime. Batching all 1,024 Hamiltonians in one evolve call results in an 18x speedup over no batching.

Figure 1. A simulation of a silicon spin qubit on a NVIDIA H100 GPU with 1,024 different Hamiltonians

For more details including code examples, see the CUDA-Q documentation.

unitaryHACK is an open source quantum computing stack hackathon, organized by Unitary Foundation, a nonprofit supporting the quantum computing community with open source projects, microgrants, and community events. As a recent event sponsor, NVIDIA submitted five CUDA-Q bounties, leading to the following three community contributions in CUDA-Q 0.12:

    Gopal-Dahale added a code example using dynamics to prepare a GHZ state with trapped ions. The example is based on the paper, Multi-Particle Entanglement of Hot Trapped IonsACE07-Sev added a tutorial on Approximate State Preparation Using MPS Sequential Encoding showing how to prepare an initial state by decomposing the initial state vector into matrix product state. This is beneficial when preparing an arbitrary input state to run on quantum hardware. In this case, the matrix product state decomposition ensures a low depth approximated circuit for the input state vector. Randl added an initial implementation of getting the matrix associated with a quantum kernel. This new API returns the matrix representing the unitary of the execution path (that is, the trace) of the provided kernel.

CUDA-Q is an open source project that accepts community contributions year-round. To learn more, visit NVIDIA/cuda-quantum on GitHub.

Get started with CUDA-Q

Visit CUDA-Q Quick Start to learn more and get started. Explore CUDA-Q applications and dynamics examples and engage with the team on the NVIDIA/cuda-quantum GitHub repo. To learn more about other tools for enabling accelerated quantum supercomputing, check out NVIDIA Quantum.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

NVIDIA CUDA-Q Quantum Computing Quantum Simulation Quantum Hardware Design Open Source AI GPU Computing unitaryHACK Python 3.13
相关文章