·3 min readQuantHedge FundsLLMsAlt Data

The Quant ML Revolution: How Two Sigma and Renaissance Are Rebuilding for the LLM Era

Traditional factor investing is dying. The funds that survive the next decade will be the ones that figured out how to ingest unstructured data at scale. Here is the shift in progress.

For 30 years, quant funds won by being better at the same game: clean numerical data in, statistical models, signal out. That game is ending. The next generation of alpha is hiding in unstructured data — earnings call tone, supply chain disruption, satellite imagery, executive language patterns — and the only credible way to extract it at scale is LLMs.

Why traditional factors broke

Fama-French five-factor models, momentum, low-vol, quality — these were once edge. Now they are commodities. Every quant fund on earth runs the same regressions on the same Compustat data and gets the same answers. The result is well-documented:

  • Sharpe ratios on classic factor strategies have collapsed since 2015
  • The capacity-weighted edge for "smart beta" ETFs is functionally zero after fees
  • Even AQR and Two Sigma have publicly acknowledged the "alpha decay" problem

Edge moved to data nobody else has — and increasingly, to data nobody else can process.

The alt-data tidal wave

The alternative data industry is now a ~$15B market. Funds buy:

  • Satellite imagery of parking lots, oil tanks, shipping ports
  • Credit card aggregates showing real-time consumer spend
  • Web scraping of pricing, hiring, supply chain mentions
  • Geolocation data from mobile devices
  • Social sentiment from Reddit, X, Stocktwits
  • Earnings call audio — tone, hesitation, pace, deflection

The first three were already in fund pipelines by 2018. The last three required language and audio models to be useful. That capability arrived in 2023-2024.

What changed in 2024-2026

Three shifts converged to make LLMs production-ready for quant:

  1. Context windows expanded — 200K+ tokens means a model can read a full 10-K in one pass without chunking artifacts.
  2. Cost collapsed — what cost $0.06 per 1K tokens in 2022 costs $0.001 in 2026. Processing every earnings call from the S&P 500 in real time is now affordable for any fund.
  3. Fine-tuning matured — funds train domain-specific models on their own historical research, creating proprietary signal extractors.

We treat language models the way we treated statistical arbitrage in the 1990s — a new way to find patterns nobody else can see at scale.

— Renaissance Technologies internal memo, paraphrased from a 2024 Bloomberg report

What an LLM-augmented quant stack looks like

If you joined a forward-looking quant fund in 2026, the AI layer of the stack likely looks like this:

  1. Ingestion — every public filing, earnings call, news release, central bank speech, regulatory announcement, plus internal alt-data feeds
  2. Feature extraction — LLMs extract structured signals (sentiment, hedging language, capex changes, supply chain mentions, executive turnover hints)
  3. Time-series alignment — features joined to prices, fundamentals, and other signals at minute or daily resolution
  4. Backtest engine — proprietary, walk-forward, with realistic frictions
  5. Portfolio construction — risk-aware optimizer applying the new signals on top of traditional factor exposures
  6. Execution — smart order routing, TCA, slippage modeling

Notice the change: the LLM is not the strategy. It is the feature extractor. The strategy still relies on statistical edge — but the features it operates on are richer, faster, and less crowded than anyone competing on Compustat data alone.

What this means for engineers

Quant funds were historically Python-and-C++ shops. The skill stack to be hired at one in 2026 looks different:

  • Production LLM engineering (prompt design, evaluation, fine-tuning) — not just calling APIs
  • Real-time data engineering — Kafka, Spark Streaming, low-latency pipelines
  • Classical statistics and time-series modeling — the LLM is a tool, not the answer
  • Some financial intuition — knowing why a 0.5% mention of "supply chain" on an earnings call is signal

The next 10 years of quant will not be won by whoever has the smartest model. It will be won by whoever ingests the broadest data, extracts the cleanest signals, and operates with the lowest latency. LLMs are the picks-and-shovels — and the gold rush is real.

Let's talk

I'm pivoting from manufacturing AI to finance — open to roles, mentorship, and collaborators in fintech, quant, and bank AI.