https://simonwillison.net/atom/everything 10月15日 07:51
NVIDIA DGX Spark:硬件强大,生态尚处早期
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

NVIDIA DGX Spark是一款新推出的桌面级AI超级计算机,售价约4000美元,具备强大的硬件配置,包括ARM64架构、20核CPU、128GB内存以及NVIDIA GB10(Blackwell架构)GPU。文章作者分享了初次使用体验,强调了其吸引人的外观设计和强大的性能。然而,在软件生态方面,尤其是在CUDA on ARM64的配置和兼容性方面,作者遇到了不少挑战。尽管NVIDIA提供了Docker容器和官方指南,但仍需用户具备一定的CUDA和ARM64系统知识。作者还介绍了Claude Code如何辅助解决开发和配置难题,以及Tailscale如何实现远程访问。文章最后提到,随着生态系统的快速发展,Ollama、llama.cpp、LM Studio和vLLM等项目已开始支持DGX Spark,这预示着其未来潜力。

🚀 **强大的硬件性能与设计:** NVIDIA DGX Spark以其紧凑、科幻感十足的外观设计和强大的内部配置吸引眼球。它搭载了ARM64架构CPU(20核),128GB内存,以及配备Blackwell架构的NVIDIA GB10 GPU,总内存容量高达119.68GB。这使其成为一个能够同时支持AI模型训练和运行的强大桌面级设备,满足了AI研究人员对高性能计算的需求。

⚠️ **CUDA on ARM64的生态挑战:** 尽管硬件强大,但将NVIDIA的CUDA生态系统引入ARM64架构面临诸多挑战。许多现有的AI库和教程默认依赖x86架构,导致安装和配置PyTorch等框架的过程复杂且容易出错。作者通过使用NVIDIA官方Docker容器和与Claude Code的协作,逐步克服了这些兼容性问题,但也凸显了该领域仍需进一步成熟。

🛠️ **AI开发辅助工具的重要性:** 在配置和使用DGX Spark的过程中,AI辅助工具发挥了关键作用。作者详细介绍了如何利用Claude Code在Docker环境中安装和运行AI开发工具,包括设置用户权限和安装AI模型推理引擎。此外,Tailscale的集成使得用户可以从任何地方安全地远程访问DGX Spark,极大地提升了工作效率和便利性。

📈 **生态系统的快速发展与潜力:** 尽管DGX Spark的生态系统尚处于早期阶段,但近期涌现的多个重要项目(如Ollama、llama.cpp、LM Studio、vLLM)已迅速适配并开始支持该设备。这些项目的加入,尤其是Georgi Gerganov发布的llama.cpp在Spark上的基准测试结果,预示着DGX Spark在AI模型推理和部署方面拥有巨大的发展潜力,并有望在不久的将来获得更广泛的支持。

🤔 **当前阶段的评估:** 作者认为,由于自身在CUDA、ARM64和Ubuntu GPU方面的经验不足,目前尚难对其给出明确的推荐。然而,过去24小时内生态系统的显著改善令人鼓舞,并预计在几周内将能更清晰地评估该设备的整体支持情况和市场表现。

NVIDIA DGX Spark: great hardware, early days for the ecosystem

14th October 2025

NVIDIA sent me a preview unit of their new DGX Spark desktop “AI supercomputer”. I’ve never had hardware to review before! You can consider this my first ever sponsored post if you like, but they did not pay me any cash and aside from an embargo date they did not request (nor would I grant) any editorial input into what I write about the device.

The device retails for around $4,000. They officially go on sale tomorrow.

First impressions are that this is a snazzy little computer. It’s similar in size to a Mac mini, but with an exciting textured surface that feels refreshingly different and a little bit science fiction.

There is a very powerful machine tucked into that little box. Here are the specs, which I had Claude Code figure out for me by poking around on the device itself:

Hardware Specifications

    Architecture: aarch64 (ARM64)CPU: 20 cores
      10x Cortex-X925 (performance cores)10x Cortex-A725 (efficiency cores)
    RAM: 119 GB total (112 GB available)—I’m not sure why Claude reported it differently here, the machine is listed as 128GBStorage: 3.7 TB (6% used, 3.3 TB available)

GPU Specifications

    Model: NVIDIA GB10 (Blackwell architecture)Compute Capability: sm_121 (12.1)Memory: 119.68 GBMulti-processor Count: 48 streaming multiprocessorsArchitecture: Blackwell

Short version: this is an ARM64 device with 128GB of memory that’s available to both the GPU and the 20 CPU cores at the same time, strapped onto a 4TB NVMe SSD.

The Spark is firmly targeted at “AI researchers”. It’s designed for both training and running models.

The tricky bit: CUDA on ARM64

Until now almost all of my own model running experiments have taken place on a Mac. This has gotten far less painful over the past year and a half thanks to the amazing work of the MLX team and community, but it’s still left me deeply frustrated at my lack of access to the NVIDIA CUDA ecosystem. I’ve lost count of the number of libraries and tutorials which expect you to be able to use Hugging Face Transformers or PyTorch with CUDA, and leave you high and dry if you don’t have an NVIDIA GPU to run things on.

Armed (ha) with my new NVIDIA GPU I was excited to dive into this world that had long eluded me... only to find that there was another assumption baked in to much of this software: x86 architecture for the rest of the machine.

This resulted in all kinds of unexpected new traps for me to navigate. I eventually managed to get a PyTorch 2.7 wheel for CUDA on ARM, but failed to do so for 2.8. I’m not confident there because the wheel itself is unavailable but I’m finding navigating the PyTorch ARM ecosystem pretty confusing.

NVIDIA are trying to make this easier, with mixed success. A lot of my initial challenges got easier when I found their official Docker container, so now I’m figuring out how best to use Docker with GPUs. Here’s the current incantation that’s been working for me:

docker run -it --gpus=all \  -v /usr/local/cuda:/usr/local/cuda:ro \  nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 \  bash

I have not yet got my head around the difference between CUDA 12 and 13. 13 appears to be very new, and a lot of the existing tutorials and libraries appear to expect 12.

The missing documentation isn’t missing any more

When I first received this machine around a month ago there was very little in the way of documentation to help get me started. This meant climbing the steep NVIDIA+CUDA learning curve mostly on my own.

This has changed substantially in just the last week. NVIDIA now have extensive guides for getting things working on the Spark and they are a huge breath of fresh air—exactly the information I needed when I started exploring this hardware.

Here’s the getting started guide and the essential collection of playbooks. There’s still a lot I haven’t tried yet just in this official set of guides.

Claude Code for everything

Claude Code was an absolute lifesaver for me while I was trying to figure out how best to use this device. My Ubuntu skills were a little rusty, and I also needed to figure out CUDA drivers and Docker incantations and how to install the right versions of PyTorch. Claude 4.5 Sonnet is much better than me at all of these things.

Since many of my experiments took place in disposable Docker containers I had no qualms at all about running it in YOLO mode:

claude --dangerously-skip-permissions

Claude understandably won’t let you do this as root, even in a Docker container, so I found myself using the following incantation in a fresh nvcr.io/nvidia/cuda:13.0.1-devel-ubuntu24.04 instance pretty often:

apt-get update && apt-get install -y sudo# pick the first free UID >=1000U=$(for i in $(seq 1000 65000); do if ! getent passwd $i >/dev/null; then echo $i; break; fi; done)echo "Chosen UID: $U"# same for a GIDG=$(for i in $(seq 1000 65000); do if ! getent group $i >/dev/null; then echo $i; break; fi; done)echo "Chosen GID: $G"# create user+groupgroupadd -g "$G" devgrpuseradd -m -u "$U" -g "$G" -s /bin/bash dev# enable password-less sudo:printf 'dev ALL=(ALL) NOPASSWD:ALL\n' > /etc/sudoers.d/90-dev-nopasswdchmod 0440 /etc/sudoers.d/90-dev-nopasswd# Install npmDEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get install -y npm# Install Claudenpm install -g @anthropic-ai/claude-code

Then switch to the dev user and run Claude for the first time:

su - devclaude --dangerously-skip-permissions

This will provide a URL which you can visit to authenticate with your Anthropic account, confirming by copying back a token and pasting it into the terminal.

Docker tip: you can create a snapshot of the current image (with Claude installed) by running docker ps to get the container ID and then:

docker commit --pause=false <container_id> cc:snapshot

Then later you can start a similar container using:

docker run -it \  --gpus=all \  -v /usr/local/cuda:/usr/local/cuda:ro \  cc:snapshot bash

Here’s an example of the kinds of prompts I’ve been running in Claude Code inside the container:

I want to run https://huggingface.co/unsloth/Qwen3-4B-GGUF using llama.cpp - figure out how to get llama cpp working on this machine such that it runs with the GPU, then install it in this directory and get that model to work to serve a prompt. Goal is to get this command to run: llama-cli -hf unsloth/Qwen3-4B-GGUF -p "I believe the meaning of life is" -n 128 -no-cnv

That one worked flawlessly—Claude checked out the llama.cpp repo, compiled it for me and iterated on it until it could run that model on the GPU. Here’s a full transcript, converted from Claude’s .jsonl log format to Markdown using a script I vibe coded just now.

I later told it:

Write out a markdown file with detailed notes on what you did. Start with the shortest form of notes on how to get a successful build, then add a full account of everything you tried, what went wrong and how you fixed it.

Which produced this handy set of notes.

Tailscale was made for this

Having a machine like this on my local network is neat, but what’s even neater is being able to access it from anywhere else in the world, from both my phone and my laptop.

Tailscale is perfect for this. I installed it on the Spark (using the Ubuntu instructions here), signed in with my SSO account (via Google)... and the Spark showed up in the “Network Devices” panel on my laptop and phone instantly.

I can SSH in from my laptop or using the Termius iPhone app on my phone. I’ve also been running tools like Open WebUI which give me a mobile-friendly web interface for interacting with LLMs on the Spark.

Here comes the ecosystem

The embargo on these devices dropped yesterday afternoon, and it turns out a whole bunch of relevant projects have had similar preview access to myself. This is fantastic news as many of the things I’ve been trying to figure out myself suddenly got a whole lot easier.

Four particularly notable examples:

    Ollama works out of the box. They actually had a build that worked a few weeks ago, and were the first success I had running an LLM on the machine.llama.cpp creator Georgi Gerganov just published extensive benchmark results from running llama.cpp on a Spark. He’s getting 3,600 tokens/second from a MXFP4 version of GPT-OSS 20B and ~800 tokens/second from GLM-4.5-Air-GGUF.LM Studio now have a build for the Spark. I haven’t tried this one yet as I’m currently using my machine exclusively via SSH.vLLM—one of the most popular engines for serving production LLMs—had early access and there’s now an official NVIDIA vLLM NGC Container for running their stack.

Should you get one?

It’s a bit too early for me to provide a confident recommendation concerning this machine. As indicated above, I’ve had a tough time figuring out how best to put it to use, largely through my own inexperience with CUDA, ARM64 and Ubuntu GPU machines in general.

The ecosystem improvements in just the past 24 hours have been very reassuring though. I expect it will be clear within a few weeks how well supported this machine is going to be.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

NVIDIA DGX Spark AI Supercomputer ARM64 CUDA Blackwell GPU AI Ecosystem Claude Code Docker LLM
相关文章