EVIDENCE FILE

PREDICTION TRACK RECORD

Every prediction. Every outcome. Including the ones we got wrong.

2,748 predictions logged since March 2026 - across sports, macro, geopolitics, crypto, and real estate. No backfilled data. No cherry-picked wins. Every run is on the record.

2,748 PREDICTIONS LOGGED63 RESOLVED (TIME-GATED)0.233 BRIER - LIVEREBUILDING SAMPLE - METHODOLOGY V2

METHODOLOGY CALIBRATION · REBUILT, TIME-GATED, LEAK-AUDITED

Every question below was submitted to the swarm with today_override clamped to a date at least 30 days before resolution, so the model could not see the outcome in its training data. Sampled across 8 real-estate sub-domains. Audit status: passed (2026-05-25; 0/19 anachronisms). See the full anachronism scan →

LIVE METHODOLOGY BRIER

0.168

n=69 · 95% CI [0.080, 0.256]

VS. PHASE 1 BASELINE

39%

better Brier · phase1 n=111, Brier 0.277

RE SUB-DOMAINS COVERED

cap rates · permits · rents · REITs · SFH · regional · news

METHODOLOGY	N	BRIER (EXT)	BRIER (MED)
Phase 1 — baseline (no v2 archetypes)	111	0.277	0.241
Phase 2D — v2 archetypes + skill weighting (within-domain)	21	0.194	0.196
Phase 2D — v2 archetypes + skill weighting (cross-domain)← HEADLINE	69	0.168	0.175

PER-SUB-DOMAIN · PHASE2D_XDOMAIN

SUB-DOMAIN	N	BRIER (EXT)	AVG P(YES)	ACTUAL YES RATE
Construction Permits	2	0.012	0.46	0.50
Commercial Re Cap Rates	4	0.030	0.34	0.25
Regional Submarket	4	0.051	0.60	0.75
Single Family Housing	9	0.087	0.23	0.33
Multifamily Rent Vacancy	4	0.096	0.54	0.75
Mortgage Rates Treasury Fed	40	0.210	0.39	0.50
Reit Performance	3	0.247	0.48	0.00
Major News Events	3	0.316	0.52	1.00

Time-gated cross-domain calibration: every question submitted with `today_override` set to a date ≥ 30 days before resolution. Phase 2D adds skill-weighted v2 archetype clusters to the baseline. Phase 2E (supervisor v1) is excluded — known temporal leak.

WHERE THE SWARM HAS AN EDGE

Brier scores below 0.25 (coin-flip baseline) where the swarm is meaningfully calibrated. Lower is better; 0.0 is perfect.

HEADLINE BRIER (TIME-GATED)

0.233

63 resolved · Brier 0.233 · 95% CI [0.128, 0.337]

TOTAL PREDICTIONS LOGGED

2,748

63 time-gated resolved · 2% honest resolution rate (excludes circular-scored + leakage-risk)

DEPTH FIRST, BREADTH SECOND

Holodeck is built depth-first for real estate, macro, and private markets - the domains with the structured ground-truth data Gray Capital cares about. We've built dedicated archetype clusters for those. They perform.

Crypto, broad sports, and geopolitics use a general-purpose archetype mix — the same 180-expert baseline that runs across all domains. We haven't built specialized clusters there, and the track record on those categories reflects it: a Brier of 0.233 across 63 resolved predictions. Know the difference: depth-built domains (real estate, macro) earn their use; breadth domains are tools for when structured domain data isn't available.

REAL ESTATE PHASE 2D BATCH — ARCHIVED, NOT IN HEADLINE

Methodology: skill-weighted aggregation across v2 archetypes. 5 questions sampled at random per sub-domain, prompted with point-in-time context only. Lower Brier = better; 0.25 = a coin flip. Negative delta = beats the prior (Phase 1) baseline.

Why it’s not in the headline: a May methodology audit (W2.2) found that 404 of 436 historically resolved predictions had resolution_date before submitted_at — meaning the model could have seen the outcome in training data. We now exclude any prediction without positive lead time from the public Brier. This Phase 2D table is real and is shown for completeness, but its 0.165 aggregate has been retired from the headline pending a rebuild under the tighter v2 protocol (run with today_override to clamp the swarm’s knowledge horizon to the question’s vintage). Watch this page — the rebuilt number is landing this week.

SUB-DOMAIN	N	PHASE 1	PHASE 2D	Δ VS BASELINE
Commercial cap rates	5	0.227	0.098	- 0.129
Multifamily rent & vacancy	5	0.209	0.133	- 0.075
REIT performance	5	0.289	0.242	- 0.047
Construction permits	5	0.189	0.176	- 0.013
Regional submarkets	5	0.138	0.127	- 0.011
Single-family housing	5	0.144	0.140	- 0.005
Major news events	5	0.242	0.241	- 0.001
OVERALL	35	0.206	0.165	- 0.040

BREAKDOWN BY DOMAIN (LIVE)

DOMAIN	TOTAL	RESOLVED	AVG BRIER	AVG PROB
Macro	602	60	0.243	60%
Geopolitics	433	3	0.03	85%
Sports	425	0	-	-
Real Estate	385	0	-	-
General	1	0	-	-
Crypto	355	0	-	-

FEATURED OUTCOMES

Our strongest calls - highest conviction predictions with verified real-world outcomes.

GEOPOLITICS🎯 Accurate

Did the Abraham Accords gain a new Arab signatory in Q1 2026?

Swarm called5%

✗ NOBrier 0.003

View full simulation →

CRYPTO🎯 Accurate

Did Ethereum outperform Bitcoin in Q1 2026?

Swarm called5%

✗ NOBrier 0.003

View full simulation →

MACRO🎯 Accurate

Was the March 2026 CPI above 0.3% month-over-month?

Swarm called95%

✓ YESBrier 0.003

View full simulation →

MACRO🎯 Accurate

Was US retail sales growth positive in February 2026?

Swarm called95%

✓ YESBrier 0.003

View full simulation →

MACRO🎯 Accurate

Was the March 2026 PPI above 0.2% month-over-month?

Swarm called95%

✓ YESBrier 0.003

View full simulation →

MACRO🎯 Accurate

Was the April 2026 Empire State Manufacturing Index below 0?

Swarm called95%

✓ YESBrier 0.003

View full simulation →

SPORTS🎯 Accurate

Will the Golden State Warriors qualify for the 2026 NBA Playoffs?

Swarm called7%

✗ NOBrier 0.005

View full simulation →

GEOPOLITICS🎯 Accurate

Will the new Pope be elected within 2 weeks of the 2026 conclave opening?

Swarm called92%

✓ YESBrier 0.006

View full simulation →

GEOPOLITICS🎯 Accurate

Did the UK experience a general election in Q1 2026?

Swarm called8%

✗ NOBrier 0.006

View full simulation →

ECONOMICS🎯 Accurate

Will US GDP growth for Q4 2025 be reported as positive?

Swarm called85%

✓ YESBrier 0.022

View full simulation →

SPORTS🎯 Accurate

Did the Los Angeles Angels beat the Toronto Blue Jays on April 22, 2026?

Swarm called85%

✓ YESBrier 0.023

View full simulation →

MACRO🎯 Accurate

Did the Fed hold rates at its March 2026 FOMC meeting?

Swarm called85%

✓ YESBrier 0.023

View full simulation →

RECENTLY RESOLVED PREDICTIONS

50 most recent · ordered by resolution date

QUESTION	DOMAIN	PROBABILITY	OUTCOME	BRIER
Will the unemployment rate fall below 4.0% by June 2026?2026-07-02	Macro	68%	✗ NO	0.462
Will the S&P 500 be above 5,800 at end of Q2 2026?2026-07-01	Macro	72%	✓ YES	0.078
Will the S&P 500 recover above 5,500 by June 2026?2026-07-01	Macro	72%	✓ YES	0.078
Will the University of Michigan consumer sentiment index drop below 60 in Q2 2026?2026-07-01	Macro	35%	✓ YES	0.423
Will WTI crude oil end Q2 2026 (June 30) below $65/barrel?2026-07-01	Macro	38%	✗ NO	0.144
Will the breakeven inflation rate (10yr) rise above 3% in Q2 2026?2026-07-01	Macro	50%	✗ NO	0.250
Will oil (WTI) end Q2 2026 above $75/barrel?2026-07-01	Macro	68%	✓ YES	0.102
Will US auto sales (SAAR) decline below 14 million annualized in Q2 2026?2026-07-01	Macro	50%	✓ YES	0.250
Will the 5yr-5yr inflation forward rate exceed 2.5% in Q2 2026?2026-07-01	Macro	38%	✗ NO	0.144
Will the NY Fed's 1-year inflation expectation rise above 4% in Q2 2026?2026-07-01	Macro	35%	✗ NO	0.122
Will the VIX end Q2 2026 below 25?2026-07-01	Macro	68%	✓ YES	0.102
Will the yield curve (3m-10yr) remain inverted through Q2 2026?2026-07-01	Macro	38%	✗ NO	0.144
Will the EUR/USD end Q2 2026 above 1.12?2026-07-01	Macro	42%	✓ YES	0.336
Will the Chicago Fed National Activity Index (CFNAI) signal recession (below -0.7) in Q2 2026?2026-07-01	Macro	45%	✗ NO	0.202
Will the University of Michigan consumer sentiment index drop below 60 in Q2 2026?2026-07-01	Macro	65%	✓ YES	0.122
Will WTI crude oil end Q2 2026 (June 30) below $65/barrel?2026-07-01	Macro	58%	✗ NO	0.336
Will the breakeven inflation rate (10yr) rise above 3% in Q2 2026?2026-07-01	Macro	58%	✗ NO	0.336
Will oil (WTI) end Q2 2026 above $75/barrel?2026-07-01	Macro	50%	✓ YES	0.250
Will US auto sales (SAAR) decline below 14 million annualized in Q2 2026?2026-07-01	Macro	72%	✓ YES	0.078
Will the 5yr-5yr inflation forward rate exceed 2.5% in Q2 2026?2026-07-01	Macro	68%	✗ NO	0.462
Will the NY Fed's 1-year inflation expectation rise above 4% in Q2 2026?2026-07-01	Macro	65%	✗ NO	0.423
Will the University of Michigan consumer sentiment index drop below 60 in Q2 2026?2026-07-01	Macro	65%	✓ YES	0.122
Will WTI crude oil end Q2 2026 (June 30) below $65/barrel?2026-07-01	Macro	65%	✗ NO	0.423
Will the breakeven inflation rate (10yr) rise above 3% in Q2 2026?2026-07-01	Macro	68%	✗ NO	0.462
Will oil (WTI) end Q2 2026 above $75/barrel?2026-07-01	Macro	68%	✓ YES	0.102
Will US auto sales (SAAR) decline below 14 million annualized in Q2 2026?2026-07-01	Macro	72%	✓ YES	0.078
Will the 5yr-5yr inflation forward rate exceed 2.5% in Q2 2026?2026-07-01	Macro	62%	✗ NO	0.384
Will the NY Fed's 1-year inflation expectation rise above 4% in Q2 2026?2026-07-01	Macro	62%	✗ NO	0.384
Will the VIX end Q2 2026 below 25?2026-07-01	Macro	67%	✓ YES	0.109
Will the yield curve (3m-10yr) remain inverted through Q2 2026?2026-07-01	Macro	68%	✗ NO	0.462
Will the EUR/USD end Q2 2026 above 1.12?2026-07-01	Macro	52%	✓ YES	0.230
Will the Chicago Fed National Activity Index (CFNAI) signal recession (below -0.7) in Q2 2026?2026-07-01	Macro	35%	✗ NO	0.122
Will the VIX end Q2 2026 below 25?2026-07-01	Macro	72%	✓ YES	0.078
Will the yield curve (3m-10yr) remain inverted through Q2 2026?2026-07-01	Macro	32%	✗ NO	0.102
Will the EUR/USD end Q2 2026 above 1.12?2026-07-01	Macro	62%	✓ YES	0.144
Will the Chicago Fed National Activity Index (CFNAI) signal recession (below -0.7) in Q2 2026?2026-07-01	Macro	62%	✗ NO	0.384
Will the US jobs report for May 2026 show above 150,000 payrolls?2026-06-05	Macro	68%	✓ YES	0.102
Will the US jobs report for May 2026 show above 150,000 payrolls?2026-06-05	Macro	68%	✓ YES	0.102
Will the US jobs report for May 2026 show above 150,000 payrolls?2026-06-05	Macro	67%	✓ YES	0.109
Will the ISM manufacturing new orders subindex return above 50 in May 2026?2026-06-02	Macro	68%	✗ NO	0.462
Will the ISM manufacturing new orders subindex return above 50 in May 2026?2026-06-02	Macro	38%	✗ NO	0.144
Will the ISM manufacturing new orders subindex return above 50 in May 2026?2026-06-02	Macro	68%	✗ NO	0.462
Will US gasoline prices (national avg) exceed $4.00/gallon in May 2026?2026-06-01	Macro	42%	✓ YES	0.336
Will US gasoline prices (national avg) exceed $4.00/gallon in May 2026?2026-06-01	Macro	62%	✓ YES	0.144
Will US gasoline prices (national avg) exceed $4.00/gallon in May 2026?2026-06-01	Macro	68%	✓ YES	0.102
Will the Cleveland Fed's nowcast for May 2026 CPI show above 0.25% MoM?2026-05-15	Macro	58%	✓ YES	0.176
Will the new Pope be elected within 2 weeks of the 2026 conclave opening?2026-05-15	Geopolitics	92%	✓ YES	0.006
Will the Cleveland Fed's nowcast for May 2026 CPI show above 0.25% MoM?2026-05-15	Macro	58%	✓ YES	0.176
Will the Cleveland Fed's nowcast for May 2026 CPI show above 0.25% MoM?2026-05-15	Macro	62%	✓ YES	0.144
Will the new Pope be elected within 2 weeks of the 2026 conclave opening?2026-05-15	Geopolitics	72%	✓ YES	0.078

WHAT IS A BRIER SCORE?

A Brier score measures prediction accuracy on probability estimates. Lower is better. A perfect score is 0.0 (100% confidence on the correct outcome). Random guessing scores 0.25. Under 0.15 is excellent; under 0.25 is solid. Sports tend to be harder to call than macro trends.

RELIABILITY DIAGRAM · PHASE2D_XDOMAIN

69 resolved · 10 probability bins · y=x is perfect calibration

When we say “70% likely,” does it actually happen 70% of the time? Each dot below is a bin of resolved predictions; dot size = sample count. Sitting on the diagonal means calibrated. Above the line = under-confident; below = over-confident.

BIN	N	PREDICTED	OBSERVED	CALIBRATION
0.0-0.1	14	0.03	0.14	+11pp (under)
0.1-0.2	14	0.14	0.14	on target
0.2-0.3	7	0.26	0.43	+16pp (under)
0.3-0.4	2	0.34	0.50	+16pp (under)
0.4-0.5	4	0.41	1.00	+59pp (under)
0.5-0.6	5	0.56	0.60	on target
0.6-0.7	6	0.65	0.67	on target
0.7-0.8	5	0.74	0.80	+6pp (under)
0.8-0.9	10	0.84	0.90	+6pp (under)
0.9-1.0	2	0.95	1.00	+5pp (under)

Honest small-N caveat: with 69 resolved predictions, each dot is one or two outcomes. The picture sharpens as more questions resolve. Empty bins shown intentionally - no cherry-picking.

CALIBRATION TEST - 2026 NCAA MEN'S TOURNAMENT

62 games · pre-game predictions only · verified outcomes

74.2% overall accuracy (46/62). Brier score 0.155 - well below the 0.25 coin-flip baseline. The honest test: when the swarm said "70% confident," did it win 70% of the time? Below: every confidence bucket, including the bucket we deliberately said was a coin flip.

SWARM CONFIDENCE	N PICKS	CORRECT	ACTUAL ACCURACY	CALIBRATION
50-55%	6	1	16.7%	said coin flip → was coin flip
55-65%	22	14	63.6%	well-calibrated (predicted ~60%)
65-75%	16	13	81.2%	slightly underconfident
75-85%	7	7	100%	underconfident - every pick hit
85-100%	11	11	100%	high conviction well-justified

The 50-55% row is the most important. A miscalibrated swarm overclaims uncertainty. Ours said "these 6 are basically coin flips" - and they were.

FEATURED CASE STUDIES - NCAA ELITE EIGHT

March 28, 2026 · submitted before tip-off · pre-game verified

The first two publicly logged, pre-event predictions with verified outcomes. Run through the live engine before game time; both resolved the same night.

QUESTION	OUR CALL	MARKET	OUTCOME	BRIER (US)	BRIER (MKT)	VERDICT
#9 Iowa to upset #3 Illinois (Elite Eight)Mar 28, 2026	33%67% Illinois - NO upset	28% IowaVegas moneyline implied	✗ NO upsetIllinois 73-64	0.109	0.078	✓ CORRECTHolodeck called it
#2 Purdue to upset #1 Arizona (Elite Eight)Mar 28, 2026	31%69% Arizona - NO upset	25% PurdueSpread implied	✗ NO upsetArizona 79-64	0.096	0.063	✓ CORRECTHolodeck called it

2 predictions - 2 correct (100%)Avg Holodeck Brier: 0.103Avg Market Brier: 0.071Markets slightly better calibrated on these two; both made correct directional calls

WHY THIS PAGE EXISTS

THE HONEST ANSWER

We log everything from day one so that when we have hundreds of resolved predictions, you can audit the full history - not a curated highlight reel.

WHAT WE EXPECT TO WIN

Structural breaks. Regime changes. Tail events that prediction markets underprice because they're anchored to recent consensus. That's where synthetic agent swarms earn their edge.

WHAT WE DON'T HAVE YET

Per-archetype calibration on the high-conviction domains.

The swarm has 180 archetype-distinct experts across 23 domain presets. We can already show which domain the swarm is calibrated in. We can't yet show which archetypes within real estate / macro drove each call - we started logging full per-segment data on resolved questions in May 2026, so the sample size is still small (12 questions with full segment + outcome data, of 395 total resolved).

The priority is depth on multifamily underwriting and macro scenarios - those are the archetype clusters we're actively expanding. We'll ship per-archetype calibration there first, with ~10 resolved predictions per cluster as the statistical floor. Broader-domain archetype work (crypto, sports, geopolitics) is a future roadmap item, not a current focus.

ETA: real estate + macro per-archetype, late June 2026.

Run your own prediction →

Every public run is logged. This page updates as outcomes resolve.