NVIDIA Blog 10月01日 22:45
RTX PC赋能本地大模型运行,提升用户体验
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

近期发布的开源大模型(LLMs),如OpenAI的gpt-oss和Alibaba的Qwen 3,已能在个人电脑上本地运行,为用户提供高质量的输出,特别是在本地智能体AI领域。这一进展为学生、爱好者和开发者提供了探索生成式AI应用的新机遇。NVIDIA RTX PC通过加速这些体验,显著提升了AI的运行速度和响应能力。NVIDIA已针对RTX PC优化了多款主流LLM应用,最大化发挥RTX GPU的Tensor Core性能。Ollama和LM Studio是用户上手本地LLM的便捷工具,前者提供简洁的界面和拖放PDF等功能,后者则支持加载不同模型并作为本地API端点使用。NVIDIA与Ollama和llama.cpp合作,进一步提升了在RTX GPU上的性能和用户体验,包括支持新模型、优化计算核心以及改进模型调度等。

🚀 **本地大模型运行的普及与优势**:随着OpenAI的gpt-oss和Alibaba的Qwen 3等开源模型可在个人电脑上本地运行,用户得以在不牺牲输出质量的前提下,享受更高的隐私和控制权。这为学生、爱好者和开发者带来了在本地探索和应用生成式AI的便利,尤其是在构建本地智能体AI方面。

💻 **NVIDIA RTX PC的加速作用**:NVIDIA RTX PC通过优化LLM应用,充分发挥RTX GPU的Tensor Core性能,显著提升了本地AI的运行速度和响应流畅度。这使得用户能够获得快速、高效的AI体验,并推动了本地AI应用的普及。

🛠️ **Ollama与LM Studio的便捷入门**:Ollama和LM Studio是帮助用户轻松上手本地LLM的工具。Ollama提供了直观的界面,支持拖放PDF、对话聊天及多模态理解。LM Studio则允许用户加载多种LLM,进行实时交互,并可将其作为本地API端点集成到自定义项目中。

⚡ **性能优化与合作进展**:NVIDIA与Ollama和llama.cpp等项目紧密合作,持续优化LLM在RTX GPU上的性能。近期更新包括对新模型的支持(如Gemma 3、Nemotron Nano v2)、Flash Attention的默认启用、CUDA核心优化以及改进的模型调度系统,共同提升了AI推理的效率和稳定性。

💡 **AI赋能学习与效率提升**:通过AnythingLLM等应用,用户可以构建个性化的AI学习助手,例如从讲义生成抽认卡、解释复杂概念、创建和批改测验等。这种本地AI解决方案不仅提升了学习效率,还提供了无使用限制和订阅成本的隐私化体验。此外,Project G-Assist也进一步扩展了AI在游戏PC控制和笔记本电脑设置优化方面的能力。

Many users want to run large language models (LLMs) locally for more privacy and control, but until recently, this meant a trade-off in output quality. Newly released open-weight models, like OpenAI’s gpt-oss and Alibaba’s Qwen 3, can run directly on PCs, delivering useful high-quality outputs, especially for local agentic AI.

This opens up new opportunities for students, hobbyists and developers to explore generative AI applications locally. NVIDIA RTX PCs accelerate these experiences, delivering fast and snappy AI to users.

Getting Started With Local LLMs Optimized for RTX PCs

NVIDIA has worked to optimize top LLM applications for RTX PCs, extracting maximum performance of Tensor Cores in RTX GPUs.

One of the easiest ways to get started with AI on a PC is with Ollama, an open-source tool that provides a simple interface for running and interacting with LLMs. It supports the ability to drag and drop PDFs into prompts, conversational chat and multimodal understanding workflows that include text and images.

It’s easy to use Ollama to generate answers from a text simple prompt.

NVIDIA has collaborated with Ollama to improve its performance and user experience. The most recent developments include:

Ollama is a developer framework that can be used with other applications. For example, AnythingLLM — an open-source app that lets users build their own AI assistants powered by any LLM — can run on top of Ollama and benefit from all of its accelerations.

Enthusiasts can also get started with local LLMs using LM Studio, an app powered by the popular llama.cpp framework. The app provides a user-friendly interface for running models locally, letting users load different LLMs, chat with them in real time and even serve them as local application programming interface endpoints for integration into custom projects.

Example of using LM Studio to generate notes accelerated by NVIDIA RTX.

NVIDIA has worked with llama.cpp to optimize performance on NVIDIA RTX GPUs. The latest updates include:

Learn more about gpt-oss on RTX and how NVIDIA has worked with LM Studio to accelerate LLM performance on RTX PCs.

Creating an AI-Powered Study Buddy With AnythingLLM

In addition to greater privacy and performance, running LLMs locally removes restrictions on how many files can be loaded or how long they stay available, enabling context-aware AI conversations for a longer period of time. This creates more flexibility for building conversational and generative AI-powered assistants.

For students, managing a flood of slides, notes, labs and past exams can be overwhelming. Local LLMs make it possible to create a personal tutor that can adapt to individual learning needs.

The demo below shows how students can use local LLMs to build a generative-AI powered assistant:

AnythingLLM running on an RTX PC transforms study materials into interactive flashcards, creating a personalized AI-powered tutor.

A simple way to do this is with AnythingLLM, which supports document uploads, custom knowledge bases and conversational interfaces. This makes it a flexible tool for anyone who wants to create a customizable AI to help with research, projects or day-to-day tasks. And with RTX acceleration, users can experience even faster responses.

By loading syllabi, assignments and textbooks into AnythingLLM on RTX PCs, students can gain an adaptive, interactive study companion. They can ask the agent, using plain text or speech, to help with tasks like:

Beyond the classroom, hobbyists and professionals can use AnythingLLM to prepare for certifications in new fields of study or for other similar purposes. And running locally on RTX GPUs ensures fast, private responses with no subscription costs or usage limits.

Project G-Assist Can Now Control Laptop Settings

Project G-Assist is an experimental AI assistant that helps users tune, control and optimize their gaming PCs through simple voice or text commands — without needing to dig through menus. Over the next day, a new G-Assist update will roll out via the home page of the NVIDIA App.

Project G-Assist helps users tune, control and optimize their gaming PCs through simple voice or text commands.

Building on its new, more efficient AI model and support for the majority of RTX GPUs released in August, the new G-Assist update adds commands to adjust laptop settings, including:

Project G-Assist is also extensible. With the G-Assist Plug-In Builder, users can create and customize G-Assist functionality by adding new commands or connecting external tools with easy-to-create plugins. And with the G-Assist Plug-In Hub, users can easily discover and install plug-ins to expand G-Assist capabilities.

Check out NVIDIA’s G-Assist GitHub repository for materials on how to get started, including sample plug-ins, step-by-step instructions and documentation for building custom functionalities.

#ICYMI — The Latest Advancements in RTX AI PCs

Ollama Gets a Major Performance Boost on RTX

Latest updates include optimized performance for OpenAI’s gpt-oss-20B, faster Gemma 3 models and smarter model scheduling to reduce memory issues and improve multi-GPU efficiency.

Llama.cpp and GGML Optimized for RTX

The latest updates deliver faster, more efficient inference on RTX GPUs, including support for the NVIDIA Nemotron Nano v2 9B model, Flash Attention enabled by default and CUDA kernel optimizations.

Project G-Assist Update Rolls Out 

Download the G-Assist v0.1.18 update via the NVIDIA App. The update features new commands for laptop users and enhanced answer quality.

  Windows ML With NVIDIA TensorRT for RTX Now Geneally Available

Microsoft released Windows ML with NVIDIA TensorRT for RTX acceleration, delivering up to 50% faster inference, streamlined deployment and support for LLMs, diffusion and other model types on Windows 11 PCs.

NVIDIA Nemotron Powers AI Development 

The NVIDIA Nemotron collection of open models, datasets and techniques is fueling innovation in AI, from generalized reasoning to industry-specific applications.

Plug in to NVIDIA AI PC on Facebook, Instagram, TikTok and X — and stay informed by subscribing to the RTX AI PC newsletter.

Follow NVIDIA Workstation on LinkedIn and X

See notice regarding software product information.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

本地大模型 LLM RTX PC NVIDIA Ollama LM Studio AI加速 生成式AI Local LLMs NVIDIA RTX AI Acceleration Generative AI
相关文章