essay9 May 2026

Route C is live — describe a trading bot in plain English

Type a strategy. Claude picks a template, fills the parameters, and the matching-engine shadow validator runs over 30 days of real ticks before you deploy. No code. No template-fit guessing. The wedge into LLM-authored trading bots, with the safety rail that makes the wedge actually safe.

essay9 May 2026

Route C is live — describe a trading bot in plain English

The shape of the wedge

The hardest sentence to say credibly in trading-bot land is also the most valuable one:

Describe a trading strategy in plain English. We'll generate it, validate it against 30 days of real price ticks, and put it on the leaderboard.

That's Route C, live now at /build/describe. The reason it's a hard sentence to say credibly is that LLMs are very good at generating plausible-looking code or config and very bad at guaranteeing it works. “Generate me a trading bot” can produce a beautifully-formatted config that explodes the moment it touches a real matching engine. Most of the “LLM trading bot” demos you've seen on Twitter are exactly this: the generation step works, the safety rail is missing.

We built the safety rail first, on purpose, before wiring the LLM to it. The shadow-mode matching-engine validator (the post on why rolling P/L is the wrong filter touches on the same theme) replays a candidate strategy through the same code path the live engine uses, against 30 days of real ticks, and catches six classes of failure modes a candle-only backtester can't see: position-state assumptions, timeframe re-evaluation drift, edge-window coverage gaps, sizing drift, liquidity-vs-edge mismatches, and stale-data buffers.

Route C drops onto that. The LLM is constrained to produce template-conformant parameter blobs (or honestly report “no template fits” — we track those signals to design the next archetype). Whatever it produces, the validator runs. Generations that pass deploy; generations that fail get the same explanation any manually-tuned strategy would. The platform behaves the same way whether you authored your bot in code, in a template form, or in plain English.

What it actually does

From /build/describe (Crab+ tier-gated):

You type a strategy in plain English. Examples that work:
- “Buy when RSI is oversold and the trend is up, exit on take profit at 3.5%, stop at 2%, 1h timeframe.”
- “Trade ETH on a 4h timeframe; enter on breakouts above the 20-bar high; cautious sizing.”
- “Mean reversion BTC, 14-period RSI, standard 70/30 thresholds, 2x leverage.”
Claude picks the closest-fitting template (trend-follow, mean-revert, or breakout — the three Phase 1 archetypes), fills the parameters to match your intent, and surfaces concerns it has about the configuration in plain English. You see the chosen template, every parameter, and the rationale.
If your description doesn't fit any existing template — pairs trades, market-making, stat-arb, multi-leg setups — Claude calls a gap-signal tool instead of fabricating a forced fit. We log the prompt, the reason, and the suggested archetype. Patterns in those logs are the signal for “ship a fourth template” — not designer guesswork.
Refine in plain English. “Tighten the stop” · “Use ETH instead” · “Make it more cautious in chop”. Each refinement carries the prior parameters AND any validator catches forward as context, so the LLM revises with full awareness of what was wrong.
The matching-engine shadow validator runs automatically over 30 days of real ticks the moment the generation lands. If anything caught — position-state assumptions, sizing drift, liquidity vs edge, stale-data, timeframe re-eval, edge-window coverage — you see the issues inline, with severity, and the next refinement you type carries them forward as context for the LLM to address. If nothing caught, the Deploy button hands off to your bot's admin page where the tier-aware hosting flag (Shrimp → browser-eval, Crab+ → server-side) is resolved.

Why this is different from every “AI trading bot” you've seen

The market is saturated with “tell the AI what you want and it builds a bot” pitches. Most fall into one of three failure modes:

Generate-and-deploy with no validation. The LLM produces a config; the platform deploys it. When it explodes on the first vol spike, the user blames themselves. The platform got the conversion and didn't pay the failure tax.
Backtest-only validation. Some platforms do replay the strategy against historical candles before deploying. Better, but candle replays don't catch the failure modes that bite in live trading: scale-into-winner signals that trigger POSITION_ADD_UNSUPPORTED, stops that get eaten by slippage on illiquid pairs, MAs that read from stale buffers across feed gaps. The bot “backtests fine” and dies in production.
No honest gap signal. The LLM is asked to generate something for any description, even ones that don't fit the platform's primitives. So it fabricates. The user gets a bot that looks like what they asked for, works nothing like it.

Route C addresses all three by construction:

Validation = live engine, not candles. The validator is the matching engine, run in shadow mode against real ticks. If your bot's logic isn't compatible with the engine's position-state machine, the validator catches it before deploy. Same code path; same rejection codes.
Tool-use, not free-form JSON. The LLM's output is constrained to template Zod schemas via Anthropic's tool API. The model cannot produce a parameter outside its schema — generations are schema-conformant by construction, with a Zod parse on the server as belt-and-braces for cross-field refines (e.g. fast_ma < slow_ma).
Honest gap reporting. The model has an explicitreport_template_gaptool. When no template fits, it calls that instead of forcing a fit. We log the prompt, the reason, the suggested archetype. The platform tracks these as the design signal for what template to ship next — not designer intuition, actual user demand.

The integrity story, restated through Route C

Route C is the same platform commitment we keep making in different forms. The policy post committed to public, transparent, identical-for-everyone incident handling. The rolling-P/L essay committed to bracketed tournaments as the structural filter for “is this trader actually skilled.” Route C commits to the same thing one layer up: LLM-authored bots are subject to the same validator + the same leaderboard + the same tournament rules as everyone else. No special tier for AI bots, no “our LLM bots are ranked separately,” no “authored by Claude” getting a free pass.

We tag them clearly on their profile so copy-traders know — but the leaderboard treats them identically. If Claude can author a bot that wins a Crab cup, it competes for the same prize pool. If Claude fabricates a strategy that the validator catches, the user sees the catch and iterates — same way they would with a manually-tuned strategy.

That symmetry matters. The whole point of having both routes (templates AND LLM-authored) is to give users the choice without making the choice asymmetric. Route A is for users who know what they want; Route C is for users who know what they want but don't want to type it into a form.

How to use it

Tier: Crab+. Route C costs us money per call (LLM tokens), and we wanted the cost to be earned, not handed out. Win Shrimp Week 1 with any bot — manual or templated — and Route C unlocks for that agent. Promotion is the consent moment.

Quota: 10 generations per Crab user per 24h, 50 per Fish user. Refinements count. The platform-wide daily cap is a separate guard against runaway spend (configurable; default $20/day across all users). Both surfaces visible in your account; the Crab tier is generous enough that ordinary use never brushes the limit.

Output: An agent_strategy_paramsrow tagged source='llm'with full author metadata: model, prompt hash, tokens, flagged concerns, edit instruction, generation counter. Visible on the agent profile so any copy-trader knows what they're looking at. The audit chain is the same one a manual strategy gets; there's no “LLM-only” corner of the database.

Tier gate UX: users below Crab see Route C with a “unlocked at Crab” pill on the homepage card. We wanted the locked state to be honest about why it's locked, not silently disabled.

What's NOT in v1

Route C v1 is deliberately scoped. Things that aren't there yet:

Custom template generation. When the LLM hits a gap, it logs the signal and tells the user to try rephrasing. v1 doesn't generate a fresh template definition (Zod schema + evaluator function) on the fly. v2 lights this up — that's where Route C transitions from “param tuner” to “real strategy author.”
Multi-turn chat history. Each refinement is a fresh call with prior params + edit instruction + validator catches as context. No DB-stored conversation. Acceptable for v1; the diff between “what changed” is implicit in the rationale text.
Tool-use chaining. The LLM doesn't call the validator as a tool inside its own generation loop. The user triggers validation explicitly, and the LLM only sees validator output on the next refinement turn. v2 might let the LLM iterate against the validator before responding, but that's extra latency + cost, and we want to ship the simple version first.
“Explain why this won” mode. A separate analysis surface that uses the heavier Opus model to break down a deployed bot's performance is a future feature. Route C v1 is strictly authoring; explanation is a separate tool.

All of these have rough scopes in docs/route-c-scoping.md; Pitlog will cover them when they ship.

What we want to see

The metric that matters for Route C is tournament winners that came through Route C. Generations that ship is one number; generations that survive Crab Week 2 is another; generations that earn copy-tradable status is the real one. Route C is in the same bracket as every other authoring route — it earns its place on the leaderboard or it doesn't.

The secondary metric is the gap-signal table. Patterns there will tell us which template to ship next. If 50 users describe pairs trades and we don't have a pairs template, that's the signal. We'll write it up when the table starts clustering — until then, Route C is generating honest data on what users actually want from a no-code trading-bot platform.

If you want to try it, /build/describe is the entry point. Pick a Crab+ agent, type a strategy, see what the platform makes of it. The validator will be honest about what works and what doesn't. The leaderboard will be honest about whether the result was actually any good.

That's the deal we keep making. Route C is the LLM-authoring layer on top of it.

Links

← Back to Pitlog