May 30, 2026·4 min readAgentic AIQuantMulti-AgentInvestingLLMs

QUORUM: An AI Investment Committee That Argues Before It Decides — and Never Makes Up a Number

Most "AI investing" demos fail the same way: a confident model recalls a wrong figure. QUORUM splits the system in two — Python computes every number deterministically, the LLM only argues and narrates. Six agents debate from real market data and converge on a documented allocation.

There is one failure mode that quietly ruins almost every "AI invests for you" demo: the model states a number with total confidence, and the number is wrong. A made-up P/E. A misremembered drawdown. A return it hallucinated. In investing, a confidently wrong number is worse than no answer at all. QUORUM is built from the ground up to make that failure structurally impossible.

Try the live demo →

What QUORUM is

QUORUM is a simulated investment committee of specialized AI agents that argue from real market data, debate across structured rounds, and converge on a documented allocation — with a human holding the final gate. It is decision-support, not financial advice, and no real capital is ever traded. The portfolio is paper-only. The value is in the agentic engineering and the transparency of the reasoning, not in any return claim.

The committee has six seats:

A Bull and a Bear who research independently, argue their case, then rebut each other.
A Macro Strategist who adds regime context.
A Quant / Risk Officer who computes the downside — and can veto.
A Portfolio Manager who synthesizes the debate into actual weights.
A Critic who stress-tests the decision for groupthink.

The debate loop

The committee runs a cyclic debate rather than a single pass:

Research briefs → Bull & Bear argue independently, then rebut → Macro adds regime context → Risk Officer computes downside and can veto → PM synthesizes weights → Critic stress-tests for groupthink → loop or converge.

The genuine independence matters. The Bull and the Bear build their cases separately before they ever see each other's arguments, so the disagreement is real, not two faces of the same prompt nodding along. That tension is the whole point — a committee that always agrees is just one opinion wearing six hats.

The determinism boundary

Here is the architectural choice that makes QUORUM trustworthy. Every number comes from Python. The LLM never computes and never recalls a figure.

The tools layer fetches and computes deterministically: prices via yfinance, fundamentals from SEC EDGAR, news, macro series from FRED, and risk metrics (volatility, beta, VaR, drawdown) in NumPy/SciPy, all behind a point-in-time snapshot cache. A grounding guardrail rejects any unsourced number before it can enter the debate. The language model's only job is to interpret and narrate the figures the tools hand it.

This kills the number-one failure of AI-investing demos: confident wrong numbers. Prices, ratios, vol, VaR, weights — all of it is validated Python. The LLM argues; it does not calculate.

It is the same principle behind my other agentic systems — RegRadar verifying every legal citation against pinned source text, Recoupe computing recoverable amounts deterministically. Separate the verifiable math from the generative language, every time.

Honest by construction

The backtest is deliberately unflattering to itself: a $10k portfolio, point-in-time with no lookahead, trading costs included, benchmarked against SPY. Results are reported as "directionally reasonable," never "beats the market," because backtests are small-sample and regime-dependent and I would rather under-claim than oversell. The system also flags its own low-confidence decisions instead of hiding them.

A live paper portfolio advances daily via a scheduled GitHub Actions job, so there is an actual track record accumulating over time rather than a one-shot demo.

Runs with zero keys

QUORUM degrades gracefully all the way down. With no API keys at all, the committee runs on deterministic, evidence-grounded agent logic over free data (yfinance + SEC EDGAR) and completes end-to-end. Drop in a Gemini, Groq, OpenAI, or Anthropic key and the model router upgrades the debate prose to real LLM reasoning — but the numbers stay deterministic either way. The router fails over across providers so a single rate limit never takes the system down.

The backend is FastAPI streaming the live debate to a Next.js "Committee Room" over Server-Sent Events, with SQLite persistence underneath. You can watch the committee convene, see the Bull on the left and the Bear on the right, and every figure shows up as a sourced chip you can trace back to its origin.

What this taught me

Genuine disagreement is a feature you have to engineer. Independent research before rebuttal is what makes a multi-agent debate more than theater.
The LLM is the narrator, not the strategy. The moment you let it compute, you have reintroduced the exact failure you were trying to avoid.
Honesty is an architecture, not a disclaimer. Point-in-time backtests, sourced numbers, flagged low-confidence calls, and a paper-only design build trust into the system rather than apologizing for it afterward.

The pattern generalizes well past investing: any domain where an AI has to reason over numbers that must be right is a domain where you split the system in two — deterministic computation underneath, language on top. QUORUM is that idea applied to the hardest possible audience: the markets.

Try the live demo → · Source on GitHub →

Let's talk

I'm pivoting from manufacturing AI to finance — open to roles, mentorship, and collaborators in fintech, quant, and bank AI.

Email me LinkedIn

Jun 1

AEOLUS: Everyone Predicts Failures — The Money Is in Deciding When to Act

May 30

RegRadar: An Agentic Engine That Reads EU Regulation Without Hallucinating a Single Citation