~/darrells.ai
← back to work
LiveAgentic AI · 2026 · Architect & sole engineer

PredMark

A prediction-market trading platform that hunts mispricing across 100K+ markets on Kalshi and Polymarket. Its embedded analyst mines news and social buzz and scores every opportunity against cited evidence — wrapped in a measurement layer obsessed with one question: is the edge real, or just luck?

Agentic AIPrediction MarketsMeasurementGuardrailsRAG
~101K
Markets watched
0
Trades the LLM decided
40+
Co-pilot tools
867
Commits

the problem

I wanted to know if patient, signal-driven trading has measurable edge against a huge market — and to be honest with myself about whether any edge was real or just luck.

What I was aiming for

PredMark began as a personal experiment with a real question behind it: can patient, signal-driven trading find genuine edge in a market far too large for any one person to watch by hand? A hundred thousand prediction markets move every day. I wanted a research partner that could actually keep up — reading the news and the crowd across all of them, scoring what matters against real evidence, and surfacing the handful of mispricings worth a closer look.

But the harder, more interesting goal was honesty. It's easy to make money for a week and convince yourself you're a genius. I wanted a platform that would tell me, unsentimentally, when I was just getting lucky — so the thing it ultimately optimizes for isn't profit, it's the truth about whether the edge is real.

The thesis: no LLM in the trade loop

PredMark watches roughly 101,000 active markets across Kalshi and Polymarket, looking for informational edge — news, social buzz, and velocity signals that suggest a price is wrong. When it finds one, deterministic quantitative gates take the position and hold toward resolution.

AI is everywhere except the trigger. Claude Haiku scores each candidate market with forced evidence citation (so it can't hand-wave), Claude Sonnet writes the daily report, and the PreMark co-pilot answers my questions through ~40 tools. But a hard architectural rule, born from an early recursive-feedback failure, governs all of it: an LLM never decides a trade, and LLM output never feeds back as LLM input without human validation.

Measurement over vibes

The part I'm proudest of isn't the trading — it's the measurement layer built to keep me honest. It distinguishes signal accuracy from execution quality, so a profitable-but-wrong trade gets labeled “lucky,” not “good.” It's explicitly designed to catch “windfall masquerading as edge.”

Phase gates act as commitment devices: a frozen strategy stays frozen, so sunk cost can't keep a loser alive. And every agent turn — every message, tool call, and result — is persisted and replayable against live database state, so the AI's reasoning is auditable rather than opaque.

  • Evidence-cited market scoring (anti-hallucination by construction)
  • Decision-loop categorization: signal accuracy vs. execution quality
  • In-sample / out-of-sample backtest harness with a 70/30 split
  • Boot safeguard auto-pauses all live strategies on every deploy

How it works

Each cycle runs market-scan triggers (base scan, social-buzz overlay, big-probability-move trigger, news overlay), enriches candidates through a shared Tavily + Haiku ranker, and feeds the signal stream to paper fleets that evaluate variants. Promising strategies graduate from paper to small-size live calibration. A Tokyo-region proxy bridges Polymarket's geoblocking; pgvector powers semantic search over market titles.

What I'm really after

Honestly, the trading is almost beside the point. What I've really been building is a discipline for thinking clearly under uncertainty — an AI that does the relentless research a solo operator never could, paired with a measurement layer that won't let me fool myself.

If PredMark ever proves there's real edge here, wonderful. But even if it doesn't, it will have done the more valuable thing: forced me to be honest about it. That's the standard I want every system I build to hold itself to.

The PreMark co-pilot calls its tools, then flags the highest-P&L strategy as “windfall, not edge” — directionally wrong 94% of the time. Measurement over vibes.
Per-strategy performance and sector-level P&L attribution.
The full paper-trade ledger, filterable by strategy and outcome.

built with

PythonFastAPIReactPostgrespgvectorAnthropic ClaudeTavilyRailway