RegRadar: An Agentic Engine That Reads EU Regulation Without Hallucinating a Single Citation
Every act in the EU regulatory firehose carries concrete obligations a bank must implement. RegRadar extracts them, maps them to systems, ranks them by deadline, and drafts the gap-assessment memo — with every claim verified, programmatically, against the live EUR-Lex source. Here is the architecture that makes it trustworthy.
The EU publishes regulation faster than any compliance team can read it. DORA, MiCA, the AI Act, CRR3, NIS2 — each one is dozens or hundreds of articles, each article packed with concrete obligations a bank has to translate into systems, controls, and deadlines. Today that translation is done by humans with highlighters and Word documents. It is slow, it is expensive, and it does not scale to the volume coming. That is the gap I built RegRadar to close.
What RegRadar does
RegRadar is an agentic regulatory-impact engine. Point it at a regulation and it runs the full chain end-to-end, watchable live in the browser:
raw document → verified obligations → impact map → ranked actions → drafted memo → human gate
A pipeline of specialized agents does the work. A Source-Monitor watches the EU firehose via the CELLAR / EUR-Lex endpoints. A Parser turns the raw legal text into a structured article tree. An Obligation extractor pulls the concrete "you must…" duties out of each article. An Impact-Mapping agent matches every obligation to the bank's controls and surfaces the gaps. A Prioritization agent ranks what is left by deadline, effort, and risk. Finally a Memo agent drafts the gap-assessment write-up — in English and German — and hands it to a human for approval before anything is exported.
The killer feature: citation integrity by construction
Here is the failure mode that kills most "AI reads your documents" products: the model produces a confident, fluent claim and cites an article that does not say what the model says it says. In a regulated industry that is not a bug — it is a liability.
RegRadar's answer is a programmatic citation verifier. No obligation is trusted until its anchor quote is found, verbatim, in the cited article of the source — and the source is hash-pinned, so the bytes the verifier checks against can never silently drift. If the quote is not there, the obligation is rejected before it ever reaches the user, and a human gate opens.
The LLM is the feature extractor. It is not the source of truth. The pinned legal text is.
This is the same determinism boundary I keep coming back to across my agentic projects (it is the backbone of Recoupe too): parsing, hashing, citation matching, and scoring are pure, deterministic Python. The language model only extracts and narrates. It never decides what is true.
The numbers
I hand-labeled an oracle for DORA (Regulation (EU) 2022/2554) — all 64 real articles, pulled live from the EUR-Lex legal-content endpoint — so the system can be graded rather than vibe-checked. On a full live extraction (Groq → Gemini failover):
- Precision 0.917
- Recall 1.000
- F1 0.957
- 100% citation integrity on accepted obligations
The detail I am proudest of: the verifier rejected two ungrounded anchors the model over-extracted from Article 19's list items, and opened a human gate — exactly the intended behaviour. No unverified legal claim reaches the user. A system that knows what it does not know is worth more than one that is confidently wrong.
Robustness was the actual product
Most of the engineering was not prompt-writing. It was making the thing not fall over:
- No single point of quota failure. The model router fails over Groq → Gemini → OpenRouter → Ollama → and finally a deterministic mock floor. When Groq's free tier hits its daily token cap, the system degrades — it does not crash. The output fails schema validation, the guardrail flags it, a human gate opens.
- Immutable, idempotent ingestion. Documents land in a Bronze store pinned by content hash. Re-ingesting identical bytes is a no-op. Same hash, same pin, every time.
- An exact response cache so already-extracted articles never re-spend tokens.
- Eval-as-a-CI-gate. The eval harness exits non-zero on regression, so a model or prompt change that drops F1 fails the build.
- Secrets hygiene and zero hardcoding — keys only in a gitignored env file, every threshold in one config module.
The whole thing is a single FastAPI service that serves both the API and a dark "command-center" console over Server-Sent Events — five screens where you can watch the agents think, see verified obligation anchors highlighted in the source text, and approve or reject the generated memo. One deployable unit, running on a $0 free-tier stack.
What this taught me
- Citation integrity has to be enforced in code, not requested in a prompt. "Please only cite real sources" is not a control. A verifier that checks the quote against pinned bytes is.
- For any regulated workflow, separate the verifiable from the generative. Run the math and the matching deterministically; let the LLM narrate on top.
- Degrade, do not crash. Free-tier quotas, flaky providers, and bad model output are not edge cases — they are Tuesday. Design the failure path first.
RegTech is one of those domains that looks unglamorous and turns out to be exactly where agentic AI earns its keep: codifiable rules, repeatable decisions, a real ground truth to grade against, and a regulator demanding explainability. RegRadar is my bet on what that future looks like.
Let's talk
I'm pivoting from manufacturing AI to finance — open to roles, mentorship, and collaborators in fintech, quant, and bank AI.