Performance and tuning
How to measure, gate, and tune Buntime performance. The local harness (bun run perf in apps/runtime/) covers the worker pool + routing; Rancher/k3s environments are in dated reports (referenced at the bottom). For the pool model itself, see worker-pool.
Local harness
Section titled “Local harness”Runs inside apps/runtime itself. Uses temporary apps generated in .perf-fixtures/ and the real routing/pool path — useful as a baseline before external tests.
cd apps/runtimebun run perf # full run (all scenarios, default gates)bun run perf:smoke # short direct-mode, no thresholdsbun run perf:ci # short direct-mode with perf/thresholds.jsonbun run perf:gate # full run with perf/thresholds.jsonPERF_MODE=http (default) starts Bun.serve on a local port and drives requests via fetch — includes HTTP parsing, Hono routing, pool dispatch, body cloning, and response transfer. PERF_MODE=direct calls app.fetch in memory, isolating socket/client overhead.
Scenarios
Section titled “Scenarios”| Scenario | Measures | Good for detecting |
|---|---|---|
warm-noop | Latency/throughput of an already-warm persistent worker | Regression in routing, pool hit, plugin hooks |
echo-1kb | 1 KiB POST body: clone, transfer to worker, response | Serialization/IPC costs in the pool |
slow-50ms | Concurrency while worker processes 50 ms | Backpressure, pool fairness |
ephemeral-noop | Cold start with ttl: 0 (new worker per request) | Spawn cost; most sensitive to CPU |
Harness variables
Section titled “Harness variables”| Var | Default | Description |
|---|---|---|
PERF_MODE | http | http (Bun.serve) or direct (app.fetch) |
PERF_DURATION_MS | 10000 | Measured duration per scenario |
PERF_WARMUP_MS | 2000 | Warmup per scenario |
PERF_CONCURRENCY | 50 | Concurrent loops |
PERF_POOL_SIZE | 100 | LRU pool size |
PERF_CLIENT_TIMEOUT_MS | 10000 | Client timeout |
PERF_PORT | random | Fixed port in PERF_MODE=http |
PERF_SCENARIOS | all | CSV: warm-noop,echo-1kb |
PERF_JSON | unset | 1 for machine-readable JSON |
PERF_OUTPUT_FILE | unset | JSON report output path |
PERF_GATE_FILE | unset | JSON with thresholds (maxErrors, minRps, maxP95Ms, maxP99Ms, maxAvgMs) |
PERF_KEEP_FIXTURES | unset | 1 keeps generated apps in .perf-fixtures/ |
Examples:
PERF_DURATION_MS=30000 PERF_CONCURRENCY=200 PERF_POOL_SIZE=500 bun run perfPERF_MODE=direct PERF_SCENARIOS=warm-noop PERF_DURATION_MS=15000 bun run perfPERF_SCENARIOS=ephemeral-noop PERF_CONCURRENCY=5 bun run perfPERF_GATE_FILE=perf/thresholds.json PERF_OUTPUT_FILE=perf-results.json bun run perfRuntime tuning vars
Section titled “Runtime tuning vars”These directly affect the numbers the harness measures and production behavior.
| Var | Default | Effect |
|---|---|---|
RUNTIME_EPHEMERAL_CONCURRENCY | 2 | Maximum parallel ttl: 0 requests. Excess goes into the queue |
RUNTIME_EPHEMERAL_QUEUE_LIMIT | 100 | Queue depth before returning 503 |
RUNTIME_WORKER_CONFIG_CACHE_TTL_MS | 1000 | Cache TTL for worker manifest/config. 0 disables |
RUNTIME_WORKER_RESOLVER_CACHE_TTL_MS | 1000 | Cache TTL for app directory lookup. 0 disables |
Recommendations:
ttl: 0apps (functions/serverless): keep boot cheap. IncreaseRUNTIME_EPHEMERAL_CONCURRENCYonly with spare CPU — each request pays a full spawn.ttl > 0apps (HTTP services): TTL is sliding — each request renews it. The LRU pool evicts the oldest on fill.- Cache TTL = 0 only in dev when app files change constantly; in production,
1000 msis safe. - See also storage about the in-memory caches.
perf/thresholds.json is intentionally conservative — it works as a smoke gate on a small runner. It checks:
maxErrors(default0even without a gate file)minRpsmaxP95Ms,maxP99MsmaxAvgMs(optional)
Tighten gradually after collecting repeated samples on the same runner or Rancher/k3s environment. Do not aim for “production capacity” in the local harness — it measures regression, not capacity.
Reading the results
Section titled “Reading the results”The report table shows:
| Metric | Notes |
|---|---|
| Requests, RPS | Throughput per scenario |
| Errors | Timeouts, non-2xx/3xx — treated as gate failures |
p50, p95, p99 (ms) | Latency. P99 tends to regress first |
| Pool active workers | Should converge to PERF_POOL_SIZE in warm scenarios |
| Pool hit rate | Expected >90% in warm-noop; ~0% in ephemeral-noop |
| Worker creations / failures | Spike in ephemeral-noop; should be stable in warm |
| Heap usage | For detecting leaks between runs |
Exit code 1 on any error or gate violation — intentionally “fail loud” in CI.
Rancher/k3s environments
Section titled “Rancher/k3s environments”Dated reports (do not copy content — these are snapshots, read as supplementary material):
| Report | Focus |
|---|---|
2026-05-01-performance-rancher-pod-load.md | k6 against GET /_/api/health on the pod (Ingress + TLS + Traefik); pod impact (CPU/mem) |
2026-05-01-performance-rancher-worker-routes.md | k6 against temporary worker routes (perf-noop, perf-echo, perf-slow, perf-ephemeral) installed on Rancher |
Both run against https://buntime.home, namespace zomme, with lab TLS (--insecure-skip-tls-verify). The second covers warm + 1 KiB POST + slow 50 ms + cold churn (ttl: 0) with the gateway rate limit (100 req/min) active.
Checklist before comparing runs
Section titled “Checklist before comparing runs”bun testandbun run lint:typespassing on both sides.- Comparing
httpanddirectseparates runtime overhead from socket/client overhead. - Watch
warm-noopp95/p99 + pool hit rate after any change to routing, config loading, worker resolution, or plugin hooks. - Track
ephemeral-noopseparately — it is cold-start churn, not steady-state. - Use the same runner (laptop, CI runner, Rancher pod) between runs: comparing runs across different machines is not apples-to-apples.
What to measure after each change
Section titled “What to measure after each change”Mapping of “change type → sensitive scenario”:
| Changed… | Run… | Check first |
|---|---|---|
| Routing / Hono middleware | warm-noop | p95/p99, RPS, pool hit rate |
Plugin onRequest hook | warm-noop | p95/p99 (additive overhead per hook) |
| Worker pool LRU / lifecycle | warm-noop + ephemeral-noop | Worker creations, pool size, hit rate |
| IPC / wrapper.ts (worker) | echo-1kb | p95/p99 and throughput on large request body |
| Config loading / app.yaml parsing | ephemeral-noop | Cold-start RPS |
| Worker resolver (app lookup) | ephemeral-noop | Cold-start RPS, failures |
Known harness limitations
Section titled “Known harness limitations”.perf-fixtures/is deleted by default between runs; usePERF_KEEP_FIXTURES=1to inspect generated manifests.- The harness does not exercise Ingress, TLS, Traefik, NetworkPolicy, or K8s scheduling — for that, run k6 on Rancher (links above).
- On laptops with thermal throttling, long runs (
PERF_DURATION_MS > 60000) tend to degrade — use short runs for regression and long runs for capacity. PERF_MODE=directdeliberately ignores HTTP server overhead. Do not use it alone as a baseline — always compare withhttp.
Cross-references
Section titled “Cross-references”- worker-pool — LRU model, sliding TTL, ephemeral queue.
- storage — tunable in-memory caches, file stores.
- All runtime env vars in one place — see the deployment configuration reference.