Work live

mini-corporatellm

EU-resident gateway to GPT, Claude, and Gemini with company-knowledge RAG, agents, and multi-tenant governance. Live on Cloudflare Workers.

  • llm
  • rag
  • agents
  • cloudflare
  • eu-residency
  • oidc
  • multi-tenant
Repo
github.com/mj-deving/mini-corporatellm
Live
mini-corporatellm.mariusdeving.workers.dev
Published
2026-05-26

What it is

A small but honest clone of an enterprise-AI platform built for the German Mittelstand. Four modules, each independently probeable: LLM Gateway, Agent Runtime + RAG, EU-Data-Layer, Multi-Tenant + Governance.

The euphoric surprise: it does not look like a demo. It looks like the product CorporateLLM sells. Small. Real. Live.

The problem it solves

German SMEs face shadow AI today. Staff paste customer data, contracts, and source code into public chatbots because the tools are useful. Each paste is an uncontrolled transfer of personal data to a US provider. GDPR, the EU AI Act, and works-council concerns block naive adoption. The bottleneck is not the model. The bottleneck is using it lawfully.

One gateway enforces residency per request (a non-EU provider gets a 451), writes a metadata-only audit row for every call, meters usage per tenant, and gates access by role.

How it works (four modules)

LLM Gateway. Public contract is the OpenAI Chat Completions wire format, translated to Anthropic Messages and Gemini internally. Stream-through SSE, per-provider usage accounting, gateway-issued virtual keys with hierarchical budgets enforced before any upstream call.

Agent Runtime + RAG. Single-agent plan→act→verify loop with a Zod-validated tool registry. The RAG pipeline: semantic chunk, contextual retrieval, bge-m3 embed, hybrid dense+sparse with RRF, cross-encoder rerank, top-5 with citations, plus CRAG as an opt-in quality gate.

EU-Data-Layer. Per-endpoint residency policy across three tiers (sovereign, eu-residency, off). A provider outside its tier is blocked with a 451. D1, R2, Durable Objects all EU-pinned. Audit log is metadata-only. GDPR erasure follows the Art. 17(3) erase-vs-retain split.

Multi-Tenant + Governance. DB-per-tenant isolation, role-gated routes (admin, member), token metering and quota enforced in a Durable Object, OIDC SSO live-verified against Entra, admin console for users, usage, and keys.

Design decisions worth defending

  1. Residency per endpoint, not per vendor. Vendors get added and dropped. Endpoints are stable. The policy lives where the policy belongs.
  2. Audit log metadata-only. No prompt content, no completion content. Recovery against legal hold sees the request; the conversation never lands in the system of record.
  3. CRAG as opt-in quality gate. Always-on retrieval grading is expensive and often wrong. As a tier above hybrid + rerank, it earns its tokens.

Stack

TypeScript and Hono on workerd/V8. D1 with --jurisdiction eu. Durable Objects pinned EU at runtime. R2 EU bucket. Queues. Vectorize for embeddings. Cloudflare AI Gateway for cache and analytics. KV holds non-PII config only.

Honest scope

This is L1 residency, not full L2 sovereignty. Cloudflare and the US providers stay subject to the US CLOUD Act regardless of geography. The full gap and its seams are documented in docs/EU-SOVEREIGNTY.md inside the repo.

The Anthropic live-routing path is wired in code but not live-probed (no Anthropic API key in the deploy). Everything else: probeable on the live URL.