claudberghini — Claude Code at Lamborghini speed

1.0The speed.

Taalas bakes AI models directly into custom silicon — "the model is the computer," ~1000× more efficient than GPUs. Their public demo serves Llama 3.1 8B etched into a chip. We benchmarked the same task — "write a basic HTML page" — across engines:

~8 ms vs ~9 seconds.

A full HTML page is ~8 ms of inference on claudberghini. Claude Opus spends ~9 seconds on the same tokens — about 1,000× slower, which is exactly the order of magnitude Taalas claims for silicon.
2.0The story.

We couldn't get an API key — the Taalas demo is a chat box on a website, no key on offer. So we opened the browser dev tools, watched it talk, and rebuilt the API from the network traffic: POST /api/chat, raw-text streaming, a <|stats|> trailer carrying the token-rate telemetry. We didn't get access — we made access.

Then the hard part. An 8B model is a mediocre tool-follower: it rambles, invents filenames, botches JSON, and once tried to sudo rm a system file when we just said "hi." So we built an eval harness over real Claude Code agent loops and tuned recursively against it — a hill-climbing workflow proposed prompts, scored each, and looped until ten rounds passed with no gain.

What moved the needle.

Trimming ~60 tools down to the coding set · re-sampling on a botched tool call · grounding answers in real tool output · a compact eval-tuned prompt · and a guard against destructive commands. The core ops — read, edit, create, grep — land 5/5.
3.0How it works.
Claude Code talks to the proxy (via deep-claude, which isolates your real Anthropic login). The proxy turns a fast-but-weak model into a coding agent:
1. swaps Claude Code's 120 KB system prompt for a compact, tuned one,
2. injects a <tool_call> format and parses the model's text back into tool_use blocks,
3. best-of-N: re-samples until a valid tool call parses,
4. grounded best-of-N: picks the answer most supported by tool output, and
5. guards against destructive shell commands.
Backend-switchable.

Set BACKEND=openrouter to route to meta-llama/llama-3.1-8b-instruct — the same model on a billed API, which is how we tuned without hammering the demo.

4.0The setup.

You'll need Claude Code, deep-claude, and Node 18+. Clone, build, and register the endpoint:

git clone https://github.com/dennisonbertram/claudberghini
cd claudberghini
npm install && npm run build

# register the endpoint with deep-claude (one time)
deep-claude endpoints add claudberghini http://localhost:3000

5.0The run.

One launcher does everything — it auto-starts the proxy and opens a clean Claude Code session on the silicon. Run it from the project directory you want to work in.

console

$ cd your-project
$ claudberghini
…a clean Claude Code session, on silicon @ ~14,500 tok/s…

$ claudberghini -p "create an index.html with a Hello World heading"
…done in ~1.5s…

6.0The quality & the guardrails.

On the real Claude Code agent loop the core four operations — read, edit, create, grep — land 5/5. Harder multi-step and multi-file tasks reach ≈0.70 on the Taalas demo's quantized instance (1.0 on OpenRouter's). It's an 8B model: superb for focused file ops, weaker on tangled logic.

Safe by default.

The proxy refuses destructive or privileged shell calls (sudo, rm -rf /, curl…|sh, writes to /etc/*) and never forces a tool call. Run it on code you trust, like any agent.

The reference.

Override anything through the environment.

Run
claudberghini [args…]	Clean Claude Code session on Taalas silicon. Args pass through to `claude`.
claudberghini -p "task"	One-shot print mode.
./eval/real-path-eval.sh N	Score the four core tasks, N runs each.
./eval/speed-benchmark.sh	Decode rate + end-to-end timing.

Configure (env)
BACKEND	`claudberghini` (default) or `openrouter`.
TOOL_SAMPLE_ATTEMPTS	best-of-N draws for a valid tool call (default 5).
ANSWER_SAMPLE_ATTEMPTS	grounded answer candidates (default 3).
CLAUDBERGHINI_API_URL	the Taalas demo endpoint.
MAX_SYSTEM_BYTES	trim the prompt to the ~24 KB ceiling.

The same harness. A silicon engine.

1.0The speed.

2.0The story.

3.0How it works.

4.0The setup.

5.0The run.

6.0The quality & the guardrails.

The reference.