← All work

TL;DR

ShareShark (a real-money, dual-currency sweepstakes prediction platform) prices off a model that only sees fresh market data during trading hours, but users could place entries overnight and on weekends, when the model is effectively blind. If material news broke during a blind window, a sophisticated user could take the other side of a stale, mispriced price. To defend that, I built a suite of seven cooperating Claude-powered agents that watch company news, the macro tape, and the order flow 24/7, coordinating without a central controller, treating every model output as untrusted, and rolled out in shadow mode before they ever touched a live price. The interesting part isn't any single LLM call; it's the systems design around the LLM.


The problem: a blind model is an exploitable model

On a prediction platform the price is a probability, and that price is only as good as the data behind it. The pricing model refreshes from live options/market data during market hours, but entries stay open overnight, on weekends, and on holidays, when the model can't see anything new. The moment material news breaks in a blind window, the posted line is stale, and the first person to notice can print money off it. The whole agent suite exists to shrink that exploitable window.

The system: seven agents, three jobs

I used AI only where it earns its place: several agents are plain statistical engines with an LLM review gate, not LLM-first.

The signature design: cheap triage, smart analysis

Almost every LLM agent runs the same cost-optimized pipeline: - Tier 1, Haiku triage: one cheap batched call screens many candidates and returns just flags/indices. - Tier 2, Sonnet analysis: expensive deep reasoning runs only on the handful Haiku flagged, returning a structured JSON decision.

The subtle part is the safety net. When entries are closed (market open, no exploitation risk), most flagged articles are auto-ignored after the cheap pass, unless Haiku tags an article extreme, or its headline matches a curated EXTREME_KEYWORDS list (death, fraud, delisting, halt, M&A…). That keeps the news monitor's design cost near $2/month without ever creating a blind spot on the dangerous tail.

No central brain: separation of concerns

The hard part of a multi-agent system is stopping agents from fighting. An overnight war headline could trigger both News Guardian (pause the stock) and the Adjustment Agent (macro overlay), double-counting the same risk. I solved it without a central coordinator: each agent has an explicit scope encoded as hard "this is not your job" rules (News Guardian = company-specific only; Adjustment = macro/sector), and downstream agents are fed upstream agents' recent decisions per ticker so they skip anything already handled. Clean division of labor, fewer redundant data calls, no conflicting actions.

The insight: tell the model exactly what it can't see

Rather than reacting to "futures are +0.5% vs. prior close," the system reasons about what the pricing model could last observe. A staleness-tier model encodes the data-freshness regime as a function of time-of-day / weekend / holiday, feeds that to Claude as context, and scales rule-based overlays by an absorption factor. It persists a baseline snapshot between runs so an adjustment measures the move since the model went blind, not the cumulative move it already priced in. It also encodes that US futures don't trade Fri 5pm–Sun 6pm, so a weekend futures % move isn't new information. (That's the kind of detail most people get wrong.)

Treating the LLM as an untrusted financial actuator

Claude's JSON recommendations move the prices users play against, so its output is validated like hostile input: layered JSON extraction (whole-string → code-fence → brace-matching), enum validation, range clamping (e.g. multipliers bounded to [0.50, 1.10]), an invariant that the two sides of a market can never both exceed 1.0 (which would let users profit risk-free either way), and safe defaults (MANUAL_REVIEW) on any parse failure. The agent can be wrong; it can never produce an unsafe or unparseable action.

Shipping it safely

Every agent ran in shadow mode (log + Discord alert, no action) for 1–2 weeks, reviewed against a full decision audit trail (every decision persisted, including the IGNOREs, so I could tune prompts and study false positives), then a single flag flipped it live. Heartbeats plus a watchdog fire an emergency alert if an agent goes silent, and alerts are environment-tagged across staging and prod.

What this demonstrates

Scale & shape (design / shadow-validated)

14,600 lines of Python across 77 files; 7 agents; 2 Claude models (Haiku 4.5 triage, Sonnet 4.6 analysis) plus the hosted web-search tool. News Guardian runs every 5 minutes, 24/7; 61–71 tickers scanned per macro run; 11 futures/indices monitored.

Tech stack

Python 3.12 · Django / DRF · Celery + Beat / Redis · Anthropic Claude SDK (Haiku + Sonnet, web-search tool) · Marketaux / StockNewsAPI · yfinance · exchange_calendars (NYSE) · Discord alerting · AWS EC2 (staging + prod).

Honest notes

ShareShark ran a year of free-to-play and a small real-money soft launch (~50 users), with these agents validated in shadow mode, so this is design-and-shadow-validation work, not "battle-tested at scale." Cost figures are design targets from code, not billed actuals. There is no realized P&L here: these are risk-protection systems, and the platform's pricing margin and risk caps are design parameters, not earnings. Heavy lifting is shared by the Anthropic SDK + hosted web-search, the news/market-data vendors, Celery, and Django; the contribution is the orchestration, prompts, guardrails, risk modeling, and operational hardening. Some roadmap items (real-time price nudge at placement, graph-based collusion detection, formal VaR) are scoped, not shipped.