essay8 May 2026

Why rolling P/L is the wrong filter for who to copy

Every copy-trading platform shows you who's up the most this week. That's a lottery winner, not a skilled trader. Bracketed tournaments with promotion and relegation are a structurally better filter — here's why.

essay8 May 2026

Why rolling P/L is the wrong filter for who to copy

The leaderboard is a story we tell ourselves

Every major copy-trading platform — Binance Copy, Bitget, Bybit, OKX — ranks traders by some flavour of rolling return. 7-day P/L. 30-day. 90-day. Sometimes ROI, sometimes total profit, sometimes a normalised score. The chrome varies. The underlying question doesn't: who's up the most lately?

That's not the question a copy-trader is actually asking. The real question is something like: if I mirror this trader's positions for the next six months, what's the distribution of outcomes I'm signing up for? Rolling P/L is a single sample from the past. The thing you care about is a distribution over the future. Those are different objects, and treating one as a proxy for the other is the original sin of copy-trading product design.

This post is the long version of an opinion that lives in the BotPit one-liner: paper trade, win crypto, get copied. The whole product is built around the bet that bracketed tournaments are a structurally better filter than rolling P/L. Below is why we believe that.

The math of survivorship

Imagine a thousand traders, none of whom have any skill whatsoever. Each one places one trade per week. Half the time they make 5%, half the time they lose 5%. Pure coin flips. After one week, ~500 are up 5%. After two weeks, ~250 are up roughly 10%. After three weeks, ~125 are up roughly 15%. After four weeks, ~63 are up roughly 20%.

Sort that pool by rolling 4-week return. The top of the leaderboard is dominated by the 63 traders who just flipped four heads in a row. They have an audited +20% trailing return. Their charts are gorgeous. Their strategy descriptions sound disciplined in retrospect.

They are indistinguishable, by rolling P/L, from a trader with genuine skill running the same period. Because rolling P/L is the wrong instrument for the measurement. It can't separate skill from a bunch of coin flips landing the same way. And the survivorship effect compounds: the bad coin-flippers wash out quietly, the good ones rise to the top, and the platform presents them to you like a recommendation engine that knows what it's doing.

This is the central problem. Every copy-trading platform you've heard of is, at the leaderboard layer, a survivorship machine pretending to be a skill detector.

Variance is not skill (and rolling P/L can't tell)

The survivorship problem is bad enough on its own. It gets worse once you add leverage and position sizing into the mix.

Consider two traders. Trader A runs a 0.3-Sharpe long-only strategy with 5% position sizing, no leverage. Slow, careful, edge measured in basis points per trade, equity curve that climbs at maybe 1% per week with a tight 3% drawdown ceiling. Trader B runs a 5x-leveraged momentum YOLO with 50% position sizing, no risk management beyond “cut at the daily low.” Equity curve looks like an EKG. Negative expected value, in the long run.

Now run them both for one week during a strong directional week in BTC. Trader A is up 1.2%. Trader B, having correctly guessed the direction with 5x leverage, is up 42%.

Sort the leaderboard by 7-day return. Trader B is at the top, by a 35x margin. Trader B has the “copy now” button. Trader A is buried in the long tail, indistinguishable from the rest of the careful crowd.

The leaderboard is selecting on variance, not skill. Higher leverage, looser sizing, and weaker risk management all push you up the rankings during the weeks when your direction is right. When the regime flips, those same traders are first to blow up — but they're already gone from the leaderboard by then, replaced by the next cohort of high-variance bets that happened to clip the new direction. You're always seeing the survivors of this week's coin flips.

This isn't a hypothetical. It's why the published “copy this trader” programs at every major venue are dominated by survivorship + variance winners, and why the average outcome for users who copied a top-10 trader is meaningfully worse than the published track record of that trader at the moment they chose to copy. The selection process is broken at the ranking step, before the copy mechanism even fires.

Your real question, restated

Strip away the chrome and the question a copy-trader is implicitly asking is roughly:

Would I want this trader managing some fraction of my money for the next six months across a variety of market regimes?

That decomposes into:

Have they shown a repeatable edge — not a single great run?
Do they hold up in different regimes, not just the one they've been benefitting from?
Is their risk-adjusted return survivable — i.e., can I hold through their worst expected drawdown without panicking out?
When their P/L turns negative, do they tighten up or martingale?

Rolling P/L answers none of these. It conflates them all into a single scalar dominated by the most recent few weeks of luck. To actually answer the underlying question you'd need to put traders through a structured test that isolates skill from variance, and then see how they perform when conditions change. That's roughly the design of every other competitive filter in the world that produces a credible ranking — chess Elo, poker tournaments, athletic brackets, professional sports leagues, prop-firm evaluation programs. None of them ranks competitors by “most points scored in the past 4 weeks.” All of them use some form of repeated, structured test with promotion and relegation.

Trading is the only competitive activity where a 4-week variance-weighted point total is somehow considered evidence of skill. Why?

What a bracket actually filters for

A bracketed tournament with equal starting capital, equal time windows, equal asset universe, and tier-based promotion and relegation is a different instrument. It answers a closer-to-the-real-question by construction.

Specifically, a properly-designed bracket:

Removes capital noise. Everyone starts with the same paper account. A trader's position size is bounded by a sizing rule applied to a fixed equity number, not by the size of their personal bankroll. Two traders running the same strategy logic produce the same trades — so you're comparing strategies, not balance sheets.
Forces repetition. A trader who clears Shrimp doesn't get a copy button — they get promoted to Crab, where the test starts over against a tougher field. Three coin flips in a row at Shrimp doesn't buy you a Fish badge. The bracket structure is designed so that getting to the top is a multi-week story, not a single-week accident.
Costs variance-chasing. A trader who blows up at Crab gets relegated. The YOLO strategy that landed them at the top of Shrimp last week now costs them their tier this week. The bracket directly punishes the high-variance strategies that rolling P/L silently rewards.
Equalises the universe. Same asset list, same fee structure, same data feed. A trader who dominated last quarter on a niche low-liquidity coin can't fall back on it here. The strategy has to clear in the standard arena.
Time-bounds the test. Tournaments end. Equity is settled. Records are immutable. Traders can't hide a bad week by staying in cash and refusing to print until the next good setup arrives — the clock keeps moving and zero P/L finishes mid-pack.

None of this individually is novel. Trading firms have done variant versions of this internally for decades — prop firms, evaluation programs, allocation tournaments. The novel piece is doing it publicly, on paper, with the winning records becoming directly copy-tradable. That's the bridge between “structured skill test” and “copy-trading filter” that nobody's built yet.

The Stuber, framed two ways

Concrete example. The Stuber bot — our pole-position trend-follower — finished Shrimp Week 1 with +11.7% across 18 fills, top of the leaderboard, promoted to Crab uncontested. (Full recap has the trade-by-trade detail.)

Two ways to read that:

Rolling P/L view: +11.7% in a week, ranked #1, copy now. If you'd encountered The Stuber on a generic copy-trading leaderboard, that's the pitch. Maybe you'd size your copy at the trader's average position size; maybe you'd run it 1:1. Either way the platform's filter has handed you a strong-looking recommendation. Whether to trust it is your problem.

Bracket view: Won a small Shrimp field at trend-follow during a directional week — promoted to Crab where it has not yet been tested. Watch how it handles regime change in Crab Week 1 before copying. That's a more honest read of the same data. The bot showed an edge in a specific environment. It hasn't shown that edge transferring to a tougher field, and it hasn't shown what it does when the trend it was riding stops. Those are exactly the questions the next bracket will answer.

Same data. Same trades. Same fills. Different question implied by the leaderboard structure — and a different answer surfaces. The bracket-framed leaderboard pushes the user toward asking the right question; the rolling P/L leaderboard pushes them away from it.

(Crab Week 1 also threw The Stuber a 37-minute price-feed outage that orphaned a winning take-profit, which the platform backfilled at the verified TP price. That's its own policy story. It also makes the point: a bracket-based record only means something if the platform protects the operator's record from platform-induced noise. Otherwise the filter degrades.)

What a bracket can't tell you (honestly)

A bracket isn't a magic skill detector. It's a better filter, not a perfect one. Worth being honest about what it doesn't solve:

Regime fit can still mask as skill. A bot that wins Shrimp, Crab, and Fish during a year-long uptrend may genuinely have an edge — or may just be a long-only strategy enjoying long-only weather. We're mitigating this by varying tournament conditions (asset mix, regime, length) over time, but the user still has to read the conditions, not just the rank.
Week-1 noise is real. A first-week winner with a thin field and a directional week is still mostly a sample of one. Our convention is to flag uncontested promotions as such on the agent profile, and to require a minimum number of contested tournaments before a bot becomes copy-tradable. Single wins don't unlock copy.
Insufficient competition is a real failure mode. If a tournament settles with three bots in the field and one of them was ours, the “winner” hasn't been meaningfully tested. We have ainsufficient_competition rule that voids tier-up when a tournament didn't draw a real field. Better to delay a promotion than to let an unearned one through.
Strategy stuffing is a real attack. Run 100 random parameter variants in parallel; the survivors that win brackets aren't skilled, they're statistical artifacts. Same survivorship problem as rolling P/L, just one layer up the stack. We address this through anti-stuffing limits on operator entries and through paper-account-based fairness (no multi-account farming), but it's an arms race that will need ongoing investment.
Past performance is still past performance. A clean bracketed record is better evidence than rolling P/L, but it's still a backwards-looking signal. Markets change. Strategies decay. Operators burn out, change parameters, abandon their bots. The bracket gives you a stronger prior, not a guarantee. Position sizing on the copier's side still matters.

The argument is not “brackets are perfect.” The argument is brackets are a strictly better filter than rolling P/L for the question copy-traders are actually asking, and that the difference is large enough to be the basis of a product.

The deal we're offering

Here's what BotPit is, restated through this frame:

A public arena where any operator can deploy a bot — rule-based template, AI-authored, or hand-coded — with equal paper capital, equal time, equal asset universe.
A tiered bracket structure (Shrimp → Crab → Fish) where bots earn promotion by winning their cohort and earn relegation by failing to keep up. The bracket repeats. Single weeks don't define a record.
Copy-trading on a partner exchange unlocked only at Crab+ tier, only after contested promotion, only on bots whose record clears the platform's integrity policy (here's what that policy looks like in practice). BotPit never holds user funds — see /docs/no-custody.
A public Pitlog where every meaningful platform incident gets written up — because the bracket only filters accurately if the platform doesn't silently corrupt the records.

The tagline is “paper trade, win crypto, get copied.” The thesis underneath the tagline is rolling P/L is the wrong filter and we're going to replace it with one that survives contact with how competitive ranking actually works in every other domain.

What you can do

If you build trading bots — rule-based, AI-driven, or hand-coded — and you want a public arena where the record you accumulate actually matters, build one in 60 seconds. Pick a template, tune the knobs, ship to Shrimp. The ladder is open.

If you're a copy-trader who's tired of riding rolling-P/L survivorship into bad outcomes, watch the leaderboard for a few weeks. See what wins, what gets relegated, what survives the next regime. Form your own view of what the bracket-based record actually means before any real money is on the line.

And if you want to follow how the platform itself behaves when things go wrong — which is the load-bearing part of the “the record means something” promise — the Pitlog is where every incident, fix, and policy choice lives in public.

The leaderboard is a story we tell ourselves about who can trade. We're trying to make the story true.

References

← Back to Pitlog