Nvidia Developer 10月25日 00:04
NVIDIA cuOpt推出GPU加速线性规划新求解器
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了NVIDIA cuOpt库的最新进展,重点是新增的GPU加速线性规划(LP)求解器,该求解器采用障碍法(barrier method)。优化问题,如NFL赛程安排、肾移植匹配和航空公司航班调度,通常需要解决包含数百万变量和约束的大规模线性规划。cuOpt通过GPU加速显著提升了求解速度,其新的障碍法求解器在内部基准测试中,相比领先的开源CPU求解器平均提速超过8倍,相比流行的商业CPU求解器平均提速超过2倍。该新求解器为需要高精度解决方案的用户提供了更快的选择,并与cuOpt现有的PDLP和Simplex求解器协同工作,以提供更全面的优化解决方案。

⚖️ **优化问题的重要性与挑战**:文章开篇通过NFL赛程安排、肾移植匹配和航空公司航班调度等实际案例,说明了优化在解决复杂现实问题中的关键作用。然而,这些问题往往涉及海量变量和约束,导致CPU求解效率低下,即使是近似解也可能耗时,而高精度解则更具挑战性。

🚀 **NVIDIA cuOpt的GPU加速优势**:NVIDIA cuOpt是一个开源的、利用GPU加速来解决优化问题的库。其最新版本引入了一个基于障碍法的线性规划新求解器,该求解器在内部基准测试中表现出色,平均速度比领先的开源CPU求解器快8倍以上,比流行的商业CPU求解器快2倍以上,极大地提升了解决大规模线性规划问题的效率和准确性。

💡 **线性规划求解方法与cuOpt的整合**:文章详细介绍了求解线性规划的三种主要方法:Simplex、PDLP和Barrier。cuOpt通过并行运行这三种算法,并提供最先完成的解,以适应不同问题的需求。新加入的障碍法求解器尤其适用于需要高精度解决方案的大规模问题,弥补了Simplex方法在处理大规模问题时的不足,并与cuOpt的PDLP协同工作,提供更全面的解决方案。

🛠️ **cuOpt障碍法求解器的技术细节**:cuOpt的GPU加速障碍法求解器基于1990年代的预测-校正算法,并利用NVIDIA cuDSS库进行高效的稀疏线性方程组求解。cuDSS的改进,如更快的符号分解和确定性模式,以及cuSPARSE库的配合,使得该算法能够充分发挥GPU的并行计算能力,实现显著的性能提升。

How does the NFL schedule all its regular-season games while avoiding stadium conflicts with Beyoncé concerts?  

How can doctors use a single donated kidney to trigger a chain of transplants, and how do they create the longest possible transplant chain to save the most patients?

How do airlines plan flight crew schedules around rest requirements and crew locations while minimizing hotel and deadheading costs?

These types of problems can be solved using optimization, a branch of applied mathematics that uses linear programming (LP) and mixed integer programming (MIP) to find optimal solutions. However, solving such problems isn’t easy because they often contain millions of variables and constraints that can slow CPUs. Approximate solutions can sometimes be found quickly, but finding highly accurate solutions takes longer. 

NVIDIA cuOpt is an open source GPU-accelerated library for solving optimization problems. The latest cuOpt release includes a new solver for linear programs that uses the barrier method. This new solver complements the existing cuOpt LP solvers, enabling engineers, researchers, and developers to solve large-scale linear programs quickly and accurately.

Internal benchmarks show cuOpt barrier delivering over 8x average speedup compared to a leading open source CPU solver and over 2x average speedup compared to a popular commercial CPU solver on a public test set of large linear programs (Figure 1).

When cuOpt barrier is combined with cuOpt PDLP in concurrent mode, it ranked first among open source solvers and second overall out of eleven solvers in public benchmarks (retrieved October 20, 2025).

This post explains the basics of linear programs, introduces the new cuOpt GPU-accelerated barrier method, and dives deeper into the benchmark data on its performance.

Figure 1. Average speedup on Mittelmann test set on NVIDIA GH200 Grace Hopper

What are linear programming and mixed integer programming? 

Linear programming is a technique for solving optimization problems where a linear objective function is minimized subject to linear constraints. A wide variety of applications can be cast as linear programs, such as blending and mixing problems in food and chemical production, transportation, network flow problems, and portfolio optimization in finance.

Linear programming also plays a crucial role in mixed integer programming, a technique for solving optimization problems where some variables take on integer, or whole number, values. Mixed integer programming is a crucial tool in the field of operations research that allows practitioners to solve problems with discrete decisions. There are many applications of mixed integer programming, including:

How are linear programs solved?

There are three main methods for solving linear programs: simplex, PDLP, and barrier.

    Simplex: The classical algorithm for solving linear programs is the simplex method, invented by George Dantzig in 1947. Simplex is still widely used today and cuOpt includes an implementation of the dual simplex algorithm.PDLP: A new first-order method that exploits the GPU was recently introduced called Primal-Dual Hybrid Gradient for Linear Programming (PDLP).Barrier: In the 1990s, a new class of algorithms for linear programming emerged called barrier or interior-point methods. Barrier methods are theoretically proven to solve linear programs in polynomial time. An algorithm developed by Sanjay Mehrotra called the predictor-corrector method is used in open source and commercial solvers. In practice, these methods usually take somewhere between 20 and 200 iterations to find a solution, regardless of the size of the problem. Each iteration requires multiple sparse systems of linear equations to be solved. 

Each method excels in different problems and in different use cases (Table 1). 

Method When to use
Simplex –Best for small to medium LPs
–Requires a factorization to fit into memory
–Produces the highest-accuracy solutions
–Solutions lie at a vertex of the feasible region

PDLP
–Fast on large LPs
–No factorization required
–Low-accuracy solutions must be acceptable
Barrier –Fast on large LPs
–Requires a factorization to fit into memory
–Produces high-accuracy solutions
Table 1. The three methods for solving linear programming problems and their ideal applications, and specific requirements

How does cuOpt solve linear programs?

When using cuOpt, you don’t need to choose which method to solve linear programs. By default, cuOpt runs all three algorithms concurrently and provides the solution from the method that finishes first.

cuOpt contains an implementation of PDLP that can very quickly solve LPs to low accuracy (a relative tolerance of 1e-4 to 1e-6, for example). However, many applications of linear programming require more accurate solutions (a relative tolerance of 1e-8 or an absolute tolerance of 1e-6, for example).

Previously, when a cuOpt user required a highly accurate solution, the only option was to use the simplex algorithm. While the simplex algorithm is well-suited for small- to medium-scale LPs, it can struggle on large-scale LPs. 

With the introduction of the new barrier method, cuOpt extends its high-performance GPU-accelerated LP solvers to large-scale LPs where high accuracy is required. 

How does the cuOpt GPU-accelerated barrier solver work?

The cuOpt barrier method uses the same well-proven predictor-corrector algorithm developed by researchers in the 1990s, following the book Primal-Dual Interior-Point Methods and the paper On Implementing Mehrotra’s Predictor-Corrector Interior-Point Method for Linear Programming.

The predictor-corrector method requires solving multiple sparse systems of linear equations at each iteration. Until recently, these were solved using multi-threaded algorithms on the CPU. This changed with the release of NVIDIA cuDSS, the NVIDIA GPU-accelerated sparse direct-solver library. cuDSS significantly outperforms CPU algorithms. cuDSS enabled the cuOpt team to bring the Mehrotra predictor-corrector algorithm to the GPU. 

The latest release, cuDSS 0.7, includes several enhancements developed specifically for cuOpt:

    Faster symbolic factorization: On average, symbolic factorization is now 2.5x faster over a test set of optimization matrices.Option to stop cuDSS reordering and symbolic factorization phases: This is helpful when running concurrently with algorithms like PDLP and simplex.Deterministic mode: cuDSS 0.7 includes a bit-wise deterministic mode. This allows cuOpt to be reproducible, returning identical results given the same inputs.

In addition to cuDSS, cuOpt uses cuSPARSE to perform sparse matrix-matrix multiplication and sparse matrix-vector multiplications efficiently on the GPU.

How does the cuOpt barrier method compare to open source and commercial CPU solvers? 

Performance of the new cuOpt barrier method was measured over a publicly available test set of 61 linear programs maintained by Hans Mittelmann at Arizona State University. For details, see Benchmarks for Optimization Software.

The linear programs in this test set are large. Figure 2 shows the distribution of these problems in terms of the number of variables and constraints—each dot represents an LP. This test set has about a dozen problems with more than 1 million variables and constraints.

Figure 2. Dimensions of LPs in the Mittelmann test set

Experimental setup

Runtime was compared for the cuOpt barrier method versus several CPU solvers. All solvers were run on an NVIDIA GH200 Grace Hopper machine with 72 3.4GHz CPU cores, 500 GB of RAM, and an NVIDIA H200 GPU with 480 GB of VRAM. Each solver was configured to run the barrier method without crossover; all other settings were taken as default. All problems were run with a time limit of one hour. If a solver failed to solve a problem, the runtime was taken to be one hour. 

How does the cuOpt barrier method compare to an open source CPU solver?

In the test set, cuOpt solves 55/61 problems, and the open source solver solves 48/61 problems. On average (geometric mean) over the test set, cuOpt is faster than the open source solver by 8.81x. 

Figure 3 shows the speedup of the cuOpt barrier method compared to the open source solver for each of the 61 problems in the Mittelmann test set on NVIDIA GH200. Each bar represents the open source solver runtime divided by the cuOpt runtime on a particular model. Bars greater than 1 (in green) represent LPs where cuOpt is faster. Bars less than 1 (in gray) represent LPs where the open source solver is faster. 

Figure 3. cuOpt barrier speedup compared to an open source CPU solver on GH200

How does the cuOpt barrier method compare to commercial CPU solvers?

cuOpt was also measured against two leading commercial solvers. The experimental setup was the same as previously described: all solvers were run using the barrier method without crossover and default settings. 

Commercial CPU solver A

On 31/61 problems in the test set, cuOpt is faster than the commercial solver. On seven problems, cuOpt is more than 5x faster than the commercial solver—with a maximum speedup of 17x. However, on average (geometric mean), commercial solver A is 1.5x faster over the whole test set. This is likely due to the sophisticated presolve methods used by commercial solver A. cuOpt does not have an extensive presolve. Commercial solver A solves 60/61 problems in the test set.

Figure 4. cuOpt barrier speedup compared to commercial CPU solver A on GH200

Commercial CPU solver B

cuOpt does better against another popular commercial CPU solver B. On average (geometric mean) over the test set, cuOpt is 2x faster than commercial solver B.

This commercial solver solved 58/61 problems in the test set.

Figure 5. cuOpt barrier speedup over commercial CPU solver B on GH200

Get started with the NVIDIA cuOpt barrier method

The cuOpt open source solver helps solve large-scale LPs quickly and accurately. You can now run simplex, PDLP, and barrier on your largest and most challenging linear programs.

To get started, visit the cuOpt Barrier Notebook to learn how to install cuOpt 25.10, run the new barrier method, and use important settings like presolve and folding

Download cuOpt 25.10 and run it on your toughest linear programs to see how it compares with other solvers. 

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

NVIDIA cuOpt Optimization Linear Programming GPU Acceleration Barrier Method NVIDIA Applied Mathematics AI Machine Learning High-Performance Computing cuDSS cuSPARSE
相关文章