Val Town Blog 10月02日 20:56
Node.js进程创建缓慢的原因及优化
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章探讨了Node.js中进程创建(spawn)的缓慢问题,通过对比Node.js、Deno、Bun、Go和Rust的性能,发现Node.js在spawn调用上存在显著瓶颈。作者尝试了多种优化方法,包括使用cluster模块、worker_threads和child_process.fork,并分析了不同方法的效果。文章还讨论了日志处理和负载均衡的问题,最终提出使用process.send进行日志传输可能是最实用的解决方案。

🔍 文章通过基准测试对比了Node.js、Deno、Bun、Go和Rust在进程创建(spawn)上的性能,发现Node.js的性能显著落后于其他语言和运行时。

🧵 作者尝试了多种优化Node.js spawn性能的方法,包括使用cluster模块创建多个HTTP服务器进程、使用worker_threads将spawn调用转移到worker线程、以及使用child_process.fork从主线程启动少量worker进程。

📈 尽管尝试了多种优化方法,Node.js的spawn性能仍然没有显著提升,而Deno和Bun的性能有所改善但仍然落后于编译型语言如Go和Rust。

🔄 文章探讨了不同的日志处理方法,包括使用文件描述符、process.send和抽象套接字,并发现使用process.send可能是最实用和简单的解决方案。

🤝 作者建议可以编写一个child_process库,将spawn调用委托给进程池,以实现与node:child_process相同的API表面但提高性能。

Why is spawning a new process in Node so slow?

on

At Val Town we run your code in Deno processes. We recently noticed, that underload, a single Val Town’s Node server cannot exceed 40 spawns/s. It spends 30%of its time with the main thread blocked on calls to spawn. Why is it so slow?Can we make it any faster?

To simulate this pattern we’ll write an HTTP server that spawns a new processfor each request. Like this:

import { spawn } from "node:child_process";

import http from "node:http";

.createServer((req, res) => spawn("echo", ["hi"]).stdout.pipe(res))

We’ll write similar implementations in Go(here)and Rust(here)and run this example with Node, Deno and Bun.

I am running all of these on a Hetzner CCX33 with 8 vCPUs and 32 GB of ram. I ambenchmarking with bombardierrunning on the same machine. The command I’ll run to benchmark each server isbombardier -c 30 -n 10000 http://localhost:8001. 10,000 total requests over 30connections. I prewarm each server before running the benchmark. I’m using Gov1.22.2, Rust v1.77.2, Node v22.3.0, Bun 1.1.20, and Deno 1.44.2.

Here are the results:

Language/RuntimeReq/sCommand
Node651node baseline.js
Deno2,290deno run --allow-all baseline.js
Bun2,208bun run baseline.js
Go5,227go run go/main.go
Rust (tokio)5,466cd rust && cargo run --release

Ok, so Node is slow. Deno and Bun have figured out how to make this faster, andthe compiled, thread-pool languages are much faster again.

Node’s spawn performance does seem to be notably bad. Thisthread was an interesting read,and while in my testing things have improved since the time of that post, Nodestill spends an awful lot of time blocking the main thread for each Spawn call.

Switching to Bun or Deno would improve this a lot. That is great to know, butlet’s try and improve things with Node.

The simplest thing we can do spawn more processes and run an http serverper-process using Node’s cluster module. Like so:

import { spawn } from "node:child_process";

import http from "node:http";

import cluster from "node:cluster";

import { availableParallelism } from "node:os";

for (let i = 0; i < availableParallelism(); i++) cluster.fork();

.createServer((req, res) => spawn("echo", ["hi"]).stdout.pipe(res))

Node shares the network socket between processes here, so all of our processescan listen on :8001 and they’ll be routed requests round-robin.

The main issue with this approach for me is that each HTTP server is isolated inits own process. This can complicate things if you manage any kind of in-memorycaching or global state that needs to be shared between these processes. I’dideally find a way to keep the single thread execution model of javascript andstill make spawns fast.

Here are the results:

Language/RuntimeReq/sCommand
Node1,766node cluster.js
Deno2,133deno run --allow-all cluster.js
Bunn/a”node:cluster is not yet implemented in Bun”

Super weird. Deno is slower, Bun doesn’t work just yet, and Node has improveda lot, but I would have expected it to be even faster.

Nice to know there is some speedup here. We’ll move on from it for now.

If the spawn calls are blocking the main thread, let’s move them to workerthreads.

Here’s our worker-threads/worker.js code. We listen for messages with acommand and an id. We run it and post the result back. We’re using execFilehere for convenience, but it is just an abstraction on top of spawn.

import { execFile } from "node:child_process";

import { parentPort } from "node:worker_threads";

parentPort.on("message", (message) => {

const [id, cmd, ...args] = message;

execFile(cmd, args, (_error, stdout, _stderr) => {

parentPort.postMessage([id, stdout]);

And here’s our worker-threads/index.js. We create 8 worker threads. When wewant to handle a request we send a message to a thread to make the spawn calland send back the output. Once we get the response back, we respond to the httprequest.

import assert from "node:assert";

import http from "node:http";

import { EventEmitter } from "node:events";

import { Worker } from "node:worker_threads";

const newWorker = () => {

const worker = new Worker("./worker-threads/worker.js");

const ee = new EventEmitter();

// Emit messages from the worker to the EventEmitter by id.

worker.on("message", ([id, msg]) => ee.emit(id, msg));

// Spawn 8 worker threads.

const workers = Array.from({ length: 8 }, newWorker);

const randomWorker = () => workers[Math.floor(Math.random() * workers.length)];

const spawnInWorker = async () => {

const worker = randomWorker();

const id = Math.random();

// Send and wait for our response.

worker.worker.postMessage([id, "echo", "hi"]);

return new Promise((resolve) => {

worker.ee.once(id, (msg) => {

.createServer(async (_, res) => {

let resp = await spawnInWorker();

assert.equal(resp, "hi\n"); // no cheating!

Results!

Language/RuntimeReq/sCommand
Node426node worker-threads/index.js
Deno3,601deno run --allow-all worker-threads/index.js
Bun2,898bun run worker-threads/index.js

Node is slower! Ok, so presumably we are not bypassing Node’s bottleneck byusing threads. So we’re doing the same work with the added overhead ofcoordinating with the worker threads. Bummer.

Deno loves this, and Bun likes it a little more. Generally, it’s nice to seethat Bun and Deno don’t see much of an improvement here. They’re already doing agood job of keeping the sycall overhead off of the execution thread.

Onward.

If threads are not going to work, let’s use child processes to do the work.We’re spawning processes to spawn processes, but we’ll spawn a small number ofworker processes from the main thread and distribute work between them. This waywe only pay the spawn cost on startup in the main thread.

This is quite easy. We simply swap out the worker threads for processes spawnedby child_process.fork and change how we send and receive messages.

$ git diff --unified=1 --no-index ./worker-threads/ ./child-process/

diff --git a/./worker-threads/index.js b/./child-process/index.js

index 52a93fe..0ed206e 100644

--- a/./worker-threads/index.js

+++ b/./child-process/index.js

@@ -3,6 +3,6 @@ import http from "node:http";

import { EventEmitter } from "node:events";

-import { Worker } from "node:worker_threads";

+import { fork } from "node:child_process";

const newWorker = () => {

- const worker = new Worker("./worker-threads/worker.js");

+ const worker = fork("./child-process/worker.js");

const ee = new EventEmitter();

@@ -21,3 +21,3 @@ const spawnInWorker = async () => {

// Send and wait for our response.

- worker.worker.postMessage([id, "echo", "hi"]);

+ worker.worker.send([id, "echo", "hi"]);

return new Promise((resolve) => {

diff --git a/./worker-threads/worker.js b/./child-process/worker.js

index 5f025ca..9b3fcf5 100644

--- a/./worker-threads/worker.js

+++ b/./child-process/worker.js

import { execFile } from "node:child_process";

-import { parentPort } from "node:worker_threads";

-parentPort.on("message", (message) => {

+process.on("message", (message) => {

const [id, cmd, ...args] = message;

@@ -7,3 +6,3 @@ parentPort.on("message", (message) => {

execFile(cmd, args, (_error, stdout, _stderr) => {

- parentPort.postMessage([id, stdout]);

+ process.send([id, stdout]);

Nice. And the results:

Language/RuntimeReq/sCommand
Node2,209node child-process/index.js
Deno3,800deno run --allow-all child-process/index.js
Bun3,871bun run worker-threads/index.js

Good speedups all around. I am very curious what the bottleneck is that ispreventing Deno and Bun from getting to Rust/Go speeds. Please let me know ifyou have suggestions for how to dig into that!

One fun thing here is that we can mix Node and Bun. Bun implements the Node IPCprotocol, so we can configure Node to spawn Bun child processes. Let’s try that.

Update the fork arguments to use the bun binary instead of Node.

const worker = fork("./child-process/worker.js", {

execPath: "/home/maxm/.bun/bin/bun",

Language/RuntimeReq/sCommand
Node + Bun3,853node child-process/index.js

Hah, cool. I get to use Node on the main thread and leverage Bun’s performance.

Logs. The previous implementations assume there will be minimal log output, butwhat if there’s a lot? We could send the logs using process.send, but thatwill be quite expensive if our output bytes are serialized to JSON.

I spent a lot of time in this rabbit hole. Here’s a rough summary of the thingsI tried:

    Passing file descriptors between processes. Like passing the stdout/err backup to the parent process. I tried this a few different ways but couldn’t getit working so that we’d always capture all the bytes written.Just using process.send. This works, but is only performant if you useserialization: "advanced" so that you can send bytes without serialization.This doesn’t work in Deno and Bun.I created a pair of AbstractSockets for each spawncall and sent the logs over the socket. This spends too much time setting upthe sockets to be worth it.

Also Abstract Sockets are crazy. I’m familiar with Unix DomainSockets where you have a filecalled (eg) something.sock and you can listen on it and connect to it justlike a network address. Turns out, that if you use a Unix socket and thefilename starts with a null byte, like \0foo the socket will not exist on thefilesystem and it’ll be automatically removed when no longer used. Weird! Cool!

After all this testing I have two approaches that work pretty well.

    Set up a pool of processes with .fork() and also set up a separate abstractsocket for each one to send logs.Simply use process.send but use serialization: "advanced".

Let’s see how those work out.

We’ll need something that outputs a lot of logs. So I grabbed the main.c filefrom Sqlite’s source. This is a 163Kb file. We’ll run the command cat main.cto print it out.

Here’s our baseline.js again with that update:

import { spawn } from "node:child_process";

import http from "node:http";

.createServer((_, res) => spawn("cat", ["main.c"]).stdout.pipe(res))

I’ve updated the Go and Rust code as well. Let’s see how they do:

Language/RuntimeReq/sCommand
Node374node baseline.js
Deno667deno run --allow-all baseline.js
Bun1,374bun run baseline.js
Go2,757go run go/main.go
Rust (tokio)3,535cd rust && cargo run --release

Fascinating. It’s cool to see Bun and Rust pull ahead here compared to theprevious benchmarks. Node is still slow very slow and Deno is surprisinglyunhappy with this workload.

Next let’s try my abstract socket communication channel implementation. It’sgetting quite complex so I won’t post it here, but you can take a lookhere.

Language/RuntimeReq/sCommand
Node1,336node child-process-comm-channel/index.js
Node + Bun2,635node child-process-comm-channel/index.js
Deno862deno run --allow-all child-process-comm-channel/index.js
Bun1,833bun child-process-comm-channel/index.js

Haha. I had seen some random benchmark results where Node+Bun was faster thanbun alone, but it never netted out in the final runs.

The Deno results are quite perplexing. In implementing this example I had a“bug” where I was buffering the response as a string. Here’s the diff of me fixing it:

@@ -88,9 +88,8 @@ const spawnInWorker = async (res) => {

worker.child.send([id, "spawn", ["cat", ["main.c"]]]);

worker.ee.on(id, (msg, data) => {

if (msg == MessageType.STDOUT) {

- resp += data.toString();

if (msg == MessageType.STDOUT_CLOSE) {

This ‘fix’ makes Deno a lot slower, but Node and Bun a lot faster! I wonder ifthat’s because one has a faster toString() implementation or higher overhead forres.write?

Language/RuntimeReq/sCommand
Deno + string buffer1,453deno run --allow-all child-process-comm-channel/index.js

Weird!

Finally, here is the process.send implementation. It is fast and alsoincredibly simple to implement. I am a little unexcited about this solutionbecause it is slower than I’d like, doesn’t support Deno and Bun, and there’svery little space to improve things. However, this implementation is deeplypractical and easy to understand, which is beautiful. Here’s the source ofworker.js, the rest is here.

import { spawn } from "node:child_process";

import process from "node:process";

process.on("message", (message) => {

const [id, cmd, ...args] = message;

const cp = spawn(cmd, args);

cp.stdout.on("data", (data) => process.send([id, "stdout", data]));

cp.stderr.on("data", (data) => process.send([id, "stderr", data]));

cp.on("close", (code, signal) => process.send([id, "exit", code, signal]));

Language/RuntimeReq/sCommand
Node1,179node child-process-send-logs/index.js

Very nice, probably the practical choice if you are only targeting Node.

A quick note on load balancing between processes. Both Go and Rust havecomplicated schedulers that distribute workefficiently. So far, when picking aworker I’ve been grabbing a random one:

const workers = await Promise.all(Array.from({ length: 8 }, newWorker));

const randomWorker = () => workers[Math.floor(Math.random() * workers.length)];

However, we can also implement round-robin, and least-connections style loadbalancing. See a wonderful writeup on thosehere.

const pickWorkerInOrder = () => workers[(count += 1) % workers.length];

const pickWorkerWithLeastRequests = () =>

workers.reduce((selectedWorker, worker) =>

worker.requests < selectedWorker.requests ? worker : selectedWorker

Sadly I didn’t see consistent performance improvements with these approaches.They all perform about the same. Maybe more typical workloads where the spawncalls are not entirely uniform would benefit more from these changes.

It seems possible, given all of these findings, to implement a child_processlibrary that implements the same API surface as node:child_process but farmsthe spawn calls out to a process pool. Maybe I will write that, or maybe youwill. Please let me know if there’s interest.

We’re sadly at the limits of my knowledge/experimentation, but I wonder whatcould unlock more performance.

It was really fun to see improved performance and what didn’t, and the randommoments where Deno/Bun/Node were affected differently.

Using Node and Bun together is a fun pattern and it’s nice to see it lead tosuch a speedup. Please support Node’s IPC, Deno!

Let me know if there’s anything else I should experiment with here! See you nexttime :)

Edit this page

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Node.js Deno Bun 进程创建 性能优化 child_process worker_threads cluster
相关文章