HolodeckHOLODECKby DarkGray
Holodeck Simulation Engine

The Science of
Collective AI
Intelligence.

180 archetype-distinct AI experts across 23 domain presets, each sampled across hundreds of independent reasonings. Each archetype carries a distinct worldview, risk tolerance, and analytical framework. Deliberating until consensus emerges — or exposing genuine uncertainty.

See the research →Try it free →
AGGREGATECONTRARIAN PMFED WATCHERSABERMETRICIANHAWK ANALYST
The Problem

The problem with
single-point forecasting.

Polls
They ask what people think. Not whether people are right. Aggregating confident ignorance at scale.
Expert Panels
Anchoring. Groupthink. Reputational risk aversion. The most credentialed voices carry the most bias.
Prediction Markets
Near-perfect on consensus events. Catastrophically wrong on structural breaks — the events that matter most.
“When the Fed's GDP forecast missed by 4.2 points in Q4 2025, prediction markets were 96% confident in the wrong outcome. Our swarm was at 57% — uncertain because the data was genuinely uncertain.”
The Architecture

How it works.

01
Question Intake
Your question is classified by domain and routed to the right agent cluster. Live market data is fetched and injected as context. Then the swarm begins.
02
Archetype Assignment
The right preset spins up. For a real-estate question, 21 archetypes load — Revenue Manager, Renter Demand-Side Economist, NCREIF Tracker, Distressed Multifamily Buyer, Capital Markets Director, and so on. Each archetype is sampled across dozens of independent draws to surface within-archetype dispersion. The Goldman economist trusts the model. The Fed Watcher parses language. The Contrarian fades consensus.
03
Independent Deliberation
Agents reason in parallel. No agent sees another's estimate before submitting. Deliberation happens across model boundaries — some archetypes run on Claude, some on Gemini, some on Haiku. Model-specific biases cancel in aggregate.
04
Ensemble Aggregation
Raw estimates aggregated via trimmed mean (drop top/bottom 10%), confidence-weighted by archetype calibration score. Output: P10/P50/P90 distribution + swarm probability + per-archetype breakdown.
The Science

The math.

P(YES | Q) =
Σwi·pi
Σwi
where:
pi = agent i's probability estimate
wi = archetype calibration weight
i = 1 ... N (typically 30,000–80,000 per swarm at expansion=100)
Brier — Real Estate (cross-domain, time-gated)
0.168
69 questions · 8 sub-domains · 39% better than phase-1
Cross-domain holdout, anachronism-audit passed (0/19 leaks)
Brier — Macro Predictions
0.18
90 resolved · Fed, FOMC, rates, FRED data
Pre-time-gating audit; refresh pending
Brier — NCAA Backtest
0.148
62 games · 2026 Men's Tournament
76.7% accuracy on pre-game picks
Archetype Variance
29 pp
range across archetype configs
Highest-leverage parameter
Independence Protocol
100%
agents see zero peer estimates
before submitting
BRIER SCORE — Measures forecast accuracy: (predicted probability − actual outcome)². Lower is better. A score of 0 is perfect. A random coin flip scores 0.25. Holodeck is built depth-first for real estate, macro, and private markets — the domains with structured ground-truth data. The swarm has dedicated archetype clusters for those, and they perform: real estate 0.168 (69 cross-domain time-gated questions across 8 sub-domains of real estate), macro 0.18 (90 resolved, pre-time-gating audit), NCAA backtest 0.148 (62 games). Everything else — broad sports, crypto, geopolitics — runs on a general-purpose archetype mix that’s still in development. The track-record page breaks this out by domain.
The Results

Real questions. Resolved outcomes.

We ran our swarm against resolved Kalshi prediction markets. Three questions. Real outcomes. Here's what happened.

QUESTIONOUTCOMEKALSHIHOLODECKVERDICT
CPI > 0.8% Mar 2026✓ YES73%44.5%Markets win
CPI > 0.9% Mar 2026✗ NO33%43.6%Markets win
GDP > 1.5% Q4 2025✗ NO96%57.7%🎯 Swarm wins
Brier Score — where Holodeck is built deep: 0.168 on real estate (69 cross-domain time-gated questions across 8 sub-domains, 39% better than the phase-1 baseline), 0.18 on macro (90 resolved, pre-time-gating audit), 0.148 on the 2026 NCAA Tournament backtest (62 games). Aggregate across all domains reflects the mix of depth-first domains (real estate, macro) with general-purpose extension into sports, crypto, geopolitics (where dedicated archetype clusters are a roadmap item, not a current focus).
When markets beat us
Consensus-range data releases with rich derivative pricing signals. When the crowd has been right before and the instruments are deep, markets are hard to beat.
When we beat markets
Structural breaks, tail events, and discontinuities. The events that matter most — and that the market is systematically overconfident about.
The Agents

Archetype gallery.

Each agent is a behavioral specification, not a persona. We define how it reasons, not what it believes.

Contrarian PM
Inverts the base case. Systematically assumes consensus is priced wrong. Looks for the crowded trade and fades it.
MACRO MARKETS
Goldman-Style Economist
Constructs a probability from factor analysis. Trusts the model. Weights data over narrative and committee over instinct.
MACRO MARKETS
Fed Watcher
Parses language for signal. Counts dot plot shifts. Lives in the delta between statement and subtext. Tracks tone before tone is consensus.
MACRO MARKETS
Credit & Fixed Income
Reads the yield curve as a probability distribution. Skeptical of growth stories without spread confirmation.
MACRO MARKETS
Behavioral Researcher
Models the forecasters, not the forecast. Looks for availability bias, anchoring, and herding in the prior estimates.
MACRO MARKETS
Diplomatic Analyst
Weights back-channel signals over public statements. Tracks second-order incentives. Reads what governments don't say.
GEOPOLITICS
Hawk Analyst
Historical pattern matching. Assumes adversaries' stated intentions. Skeptical of deterrence narratives without credible commitment.
GEOPOLITICS
Sabermetrician
xFIP over ERA. Barrel rate over batting average. Model over gut. Trusts sample sizes that others dismiss as small.
SPORTS
Momentum Trader
Trend is the signal. Ignores valuation until price confirms. Gets in early, sizes up on continuation, cuts fast when momentum breaks.
MACRO MARKETS
Dove Analyst
Assumes institutions prefer stability. Reads labor data as the binding constraint. Sees optionality in delay and risk in overreach.
MACRO MARKETS
Geopolitical Risk Desk
Prices tail risk that consensus ignores. Runs scenario trees on escalation paths. Assigns non-zero probability to the outcomes that end careers.
GEOPOLITICS
Election Modeler
Structural models over polls. Weights fundamentals — economy, incumbency, turnout — and discounts narrative drift. Treats uncertainty as a distribution, not a coin flip.
GEOPOLITICS
Vegas Sharp
Line movement over consensus. Fades public money. Knows where the vig is and prices it out. Treats closing line value as the only honest signal.
SPORTS
Injury & Roster Scout
Practice reports, travel schedules, back-to-backs. Catches what the model misses because the model doesn't watch warmups.
SPORTS
Cap Rate Analyst
Prices the going-in yield and stress-tests the exit. Anchors to comparable transactions. Skeptical of pro forma assumptions without market validation.
REAL ESTATE
Supply & Demand Modeler
Tracks permit pulls, absorption rates, and population flows. Forecasts rent trajectory from fundamentals, not comps.
REAL ESTATE
On-Chain Analyst
Wallet flows don't lie. Exchange reserves, miner activity, and dormancy cycles as leading indicators. Data-native in a narrative-driven market.
CRYPTO
Macro-Crypto Bridge
Treats BTC as a macro instrument. Correlates crypto price action to liquidity cycles, DXY, and risk-on/risk-off regimes.
CRYPTO
180 archetype-distinct experts across 23 domain presets, each sampled hundreds of times per question. Each preset has its own archetype library, calibrated against resolved outcomes.
Twelve Prediction Environments

Every domain has its own specialist library.

Macro Markets
Fed rate decisions, GDP forecasts, inflation paths, yield curve regimes.
8 specialist archetypes
🌍
Geopolitics
Treaty outcomes, conflict escalation, election scenarios, sanctions regimes.
8 specialist archetypes
Sports
MLB, NBA, NFL, soccer and beyond. Game outcomes, player props, season trajectories. Vegas line arbitrage.
6 specialist archetypes
🏢
Real Estate
Cap rate cycles, market rent trajectories, vacancy inflections, deal risk.
7 specialist archetypes
Crypto
Market structure, protocol risk, regulatory outcomes, cycle timing.
5 specialist archetypes
📈
Equities
Earnings surprises, revenue guidance, sector rotation, and single-stock event risk.
7 specialist archetypes
⚖️
Policy & Regulation
Antitrust outcomes, legislative timelines, agency rulemaking, and enforcement probability.
6 specialist archetypes
🛢️
Commodities
Oil supply shocks, agricultural cycles, metals demand, and energy transition inflection points.
5 specialist archetypes
💊
Healthcare & Biotech
FDA approval probability, clinical trial outcomes, patent cliff risk, and biosimilar entry timing.
6 specialist archetypes
🤖
Technology
Product launch success, platform adoption curves, competitive displacement, and AI capability milestones.
6 specialist archetypes
🏦
Corporate Events
M&A completion probability, activist outcomes, restructuring scenarios, and earnings beats.
5 specialist archetypes
🌐
Emerging Markets
Currency crises, sovereign debt risk, political transition, and capital flow regime shifts.
5 specialist archetypes
The Paper
WORKING PAPER — 2026
“Synthetic Agent Swarms as Calibrated Probability Estimators: Evidence from Geopolitical and Macroeconomic Forecasting”

We demonstrate that archetype specification — not model selection — is the primary driver of swarm output variance, with a 29pp range across configurations. Swarms achieve lower Brier scores than Kalshi markets on structural break events by a 25% margin.

Pre-print coming to arXiv — Q2 2026
What's Under The Hood

The public demo is one layer.

The full engine runs 22,000+ lines of research infrastructure across 12 prediction domains. What you see in the demo is one question. The full system runs entire market sweeps.

🎲
Simulation Engine
Monte Carlo Risk Analysis
N-path scenario modeling with P10/P50/P90 percentile distributions across correlated variables.
Composer
Hierarchical multi-stage simulation chains with inter-stage context propagation.
Batch Processing
1,000+ parallel simulations via Anthropic Batch API — full market sweeps overnight.
Sensitivity Analysis
Identify which input variables move the output most. Know where to focus.
⚔️
Deliberation Protocols
Adversarial Deliberation
Agents argue opposing positions before consensus. Forces the swarm to steelman the other side.
Bridge Interface
Multi-stage reasoning across time horizons — near-term and structural views integrated.
Timeline Projection
Staged probability evolution over event sequences. Watch confidence shift as events resolve.
Multi-Model Routing
Route different archetypes to different LLM backends. Model-specific biases cancel.
📊
Market Intelligence
Kalshi Integration
Live prediction market data ingestion + trading signal generation against swarm estimates.
Polymarket Integration
Market consensus injected as agent context. Know what the crowd thinks before the swarm does.
Brier Score Calibration
Ongoing accuracy tracking against resolved outcomes. Every prediction is scored.
Model Benchmarking
6 frontier LLMs evaluated on identical questions. We know which model is best per domain.
🏢
Domain Engines
GrayPrice
Multifamily rental pricing via property-specific swarm simulation. Live in production.
Real Estate IC Stress Testing
Investment committee scenario modeling. Run 500 futures on a deal before you approve it.
Deal Analysis
PE deal underwriting with swarm-generated risk scenarios across macro, local, and operational factors.
Portfolio Optimization
Multi-asset simulation with correlated risk factors. Portfolio-level stress testing.
🔬
Research Infrastructure
12 Prediction Domains, 302+ Tests
Full integration test suite. Every protocol tested against known outcomes before deployment.
Archetype Library
Calibration weights per domain, per archetype. Updated against resolved outcomes.
Agent Interview Protocol
Interrogate individual agents post-simulation. Understand why the swarm landed where it did.
Export & Reporting
LP-ready simulation reports. Full methodology, per-agent breakdown, confidence intervals.

The public demo runs a subset of the archetype library. Enterprise gets the full engine: custom domains, private deployment, dedicated calibration. Research partnerships available.

Start Predicting

Holodeck is live.

Free tier available. Enterprise plans for teams and institutions.

Try it free →Enterprise access →
Already have an account? Sign in →