·5 min readAgentic AIOptimizationEnergyOR-ToolsGovernance

AEOLUS: Everyone Predicts Failures — The Money Is in Deciding When to Act

Failure prediction is commoditised. The hard, valuable problem in industrial AI is the closed loop after the prediction: deciding when to act so you lose the least revenue, under real constraints, with a governance trail an operator can deploy. Here is how I built that for a wind fleet.

Walk any "AI for renewables" expo and you will see the same product fifty times: a model that predicts a turbine will fail. That is table stakes now — scikit-learn, a SCADA feed, and a weekend will get you a credible anomaly score. The part nobody demos is the part that actually saves money: the closed loop after the prediction. Knowing a bearing will fail in three weeks is worth nothing until you decide when to send a crew — and that decision is an economics-and-constraints problem, not a prediction problem. That gap is what I built AEOLUS to close.

Try the live demo →

The wedge

AEOLUS is a multi-agent operations brain that runs a wind fleet the way a great operations director would. It detects a turbine degrading, root-causes it, prices acting now versus later against the live electricity market and weather, schedules the cheapest safe maintenance window with a real solver, drafts the work order, and routes it to a human for one-click approval — with a governance trail a regulated operator could actually deploy.

Prediction is commoditised. Economically-optimal autonomous scheduling against a live market, with an immutable governance trail, is not.

The whole thing runs on real, openly-licensed European wind-farm data and free APIs — $0 to operate.

Five layers

AEOLUS is built as five honest layers, not one prompt doing everything:

  1. Data plane — real Kelmarsh SCADA telemetry (6 Senvion MM92 turbines, 10-minute data, 2016, CC-BY-4.0), the German day-ahead market from energy-charts, and hub-height weather from Open-Meteo.
  2. Lakehouse — a Bronze → Silver → Gold parquet medallion, with an ISA-95 asset registry. The fleet is scaled to 20 turbines by adding derived turbines (real power curves + live weather, clearly labelled synthetic) so the ops centre runs at realistic scale.
  3. Perception — normality models, anomaly + prognosis (a health score and a lead-time), a power-curve/generation forecast, and SHAP-style attribution. Crucially, these are trained on the clean baseline only, so the residuals and the prognosis are genuinely learned — not read off the injected scenario.
  4. Cognition — a LangGraph mesh: Orchestrator → Diagnostician → Market → Scheduler/Optimizer → Work-order, with RAG over the O&M manuals.
  5. Governance & action — an OPA-style policy gate, a digital-twin simulation pre-check, human approval, an immutable hash-chained audit log, and a fleet-wide kill switch.

The optimization core is the differentiator

The Scheduler chooses a maintenance window start to minimise total cost over a rolling hourly horizon:

minimise C(t) = LostRevenue(t) + RiskCost(t)

where LostRevenue is the price-weighted expected generation lost during downtime, and RiskCost is P(failure at t) × the cost of an unplanned failure — subject to hard constraints: a skilled crew must be available, wind inside the window must be under the safe-climb envelope (12 m/s), and no firm grid-dispatch commitment can be breached.

This is a CP-SAT model in Google OR-Tools, not a hand-rolled argmin. Multiple incidents compete for shared crews via optional-interval no-overlap, and a naïve "fix-it-now" baseline is solved alongside it — the difference between the two is the revenue-protected figure on the dashboard counter.

The seniority signal here is the division of labour: the LLM reasons and explains; OR-Tools does the optimisation. I never ask a language model to do the math it is bad at. The Scheduler agent reads the solver's answer and narrates the rationale. This is the same determinism boundary that runs through everything I build — QUORUM computing every market number in Python, RegRadar verifying every citation against pinned source. Let the model reason; never let it compute.

The honest result

Here is the kind of finding I think separates a real build from a demo. When I looked at the numbers, the safe-climb constraint excludes the windiest — and therefore highest-generation — windows. So among the windows you can actually send a crew up in, the lost-revenue spread is modest. The headline "value protected" counter would have looked more impressive if I had quietly ignored that.

Instead the counter reports two honest levers: the generation revenue protected by picking the cheapest safe window, and the unplanned-failure cost avoided by acting on the prognosis at all (a planned intervention now vs. an expected run-to-failure event later). The second lever is where the dominant value actually is — and saying so is the point. A model that flatters itself is not one an operator will trust.

Why governance is a first-class layer

A wind operator cannot deploy an agent that just acts. Every action in AEOLUS passes a policy gate and a digital-twin simulation, then waits for one-click human approval. The full reasoning log is hash-chained and verifiable — you can confirm the chain has not been tampered with — and there is a fleet-wide kill switch that halts all agent action at once. Rejections are logged as a training signal.

This is not decoration. It is the difference between a clever notebook and something a regulated operator could sign off. The same instinct shows up across my projects: an auditable trail in Recoupe, a human-approval gate in RegRadar. Autonomy is only deployable when every step is gated and auditable.

What this taught me

  1. Prediction is the commodity; the decision is the product. The valuable, defensible work is the economically-optimal closed loop after the model fires.
  2. Use the right tool for each job. A constraint solver schedules; a language model explains. Blur that line and you get confident, wrong math.
  3. Honesty is a feature. Reporting the modest lost-revenue spread instead of hiding it is what makes the headline number believable.

AEOLUS is an energy build, but the pattern is domain-agnostic and lands squarely where I want to work: real data, a hard optimisation core, a deterministic boundary around the LLM, and governance that a regulated operator — or a bank — could actually deploy.

Try the live demo → · Source on GitHub → · Full case study →

Let's talk

I'm pivoting from manufacturing AI to finance — open to roles, mentorship, and collaborators in fintech, quant, and bank AI.