BracketGenius
Prompt-engineered NCAA tournament prediction system that hit 73% game accuracy across 63 games of the 2026 tournament — within 2 points of the published academic benchmark.
The problem
March Madness is one of the most-studied prediction problems in sports analytics — and one of the hardest. The single-elimination format, neutral courts, and three-week window are designed to produce variance; even the best academic models top out around 75% game accuracy long-term. Most public-facing prediction tools either rely on a single metric or are black-box ensembles that can't explain a given pick. I wanted to test two things at once: (1) how far prompt engineering and LLM-driven reasoning could extend a domain-specific prediction formula, and (2) whether an explainable system — every pick traceable to specific weighted factors — could match opaque ensemble models. The shaping constraint was ground-truth evaluation: every prediction had to be made before tip-off and results tracked round by round.
The solution
Two formulas, layered with an LLM reasoning layer on top. The Game Formula — a 13-factor weighted composite (KenPom AdjEM, Seed base, ShotQuality, Kill-shot margin, Turnover margin, Guard BPR, Free-throw composite, Close-game win %, Coaching score, Trap score, Defensive rebound rank, Opponent 3PT defense, Injury adjustment). Composite differential is converted to win probability via a Bradley-Terry-style logistic: P(A wins) = 1 / (1 + 10^(-(CompA - CompB)/15)) The Champion Formula — a 3-gate Bayesian filter. Teams must clear thresholds for offensive efficiency, defensive efficiency, and strength of schedule before being scored against 14 enhancement factors. Of 81 teams, only 50 cleared all three gates. The Prompt Engineering Layer — formulas are deterministic; weights, edge cases, and matchup explanations were developed through iterative prompt sessions with Claude. The LLM acted as an iteration partner for tuning a deterministic system and as a narrator for the Head-to-Head explanations — never as the predictor itself.
Benefits
- 73.0% game accuracy across 63 games — within 2 points of the Sokol LRMC academic ceiling (~75%)
- Round of 64: 78.1% (25/32); Elite 8: 75%; Final Four: 100%; Championship correct (Michigan over UConn)
- Michigan (eventual champion) rated #5 of 81 teams pre-tournament; UConn (runner-up) at #9
- Iowa's Bennett Stirtz flagged as the upset threat by a custom March Marksman metric — the 9-seed Iowa over 8-seed Clemson pick hit, and Stirtz then took down defending champ Florida in R2
- 9-of-9 champions placed in the formula's top 3 across a 10-year pre-tournament backtest
- Every prediction traces back to specific factor weights — explainability is the feature, not a marketing line
Challenges & what I'd improve
Single-tournament sample (n=63) — the 73% number is real but it's one data point. Factor weights came from prompt iteration against past tournaments, not regression on a held-out set, which is easy to overfit to recent memory. No recent-form weighting meant three of four R1 upset misses traced back to conference-tournament momentum the static database couldn't see. The Bayesian posterior is a defensible heuristic but hasn't been calibrated against Monte Carlo simulations. Next iteration: add a recent-form factor, calibrate the posterior against 10k simulations, automate the KenPom/ShotQuality refresh, and repeat the live test on the 2027 tournament.