Redundant Costs: Users asking the same questions (or slight variations) were costing me redundant tokens.
Compliance Anxiety: I didn't want PII (names, emails, IDs) hitting OpenAI/DeepSeek servers directly.
I looked for existing gateways but most were heavy Docker containers (requiring a VPS). I wanted something 'Zero DevOps' that could run on the Edge.
The Solution: It's a lightweight Gateway built with Hono running on Cloudflare Workers.
Smart Caching: Hashes the prompt body (SHA-256) and serves from KV if it's a hit (<50ms latency).
PII Shield: Uses Regex/NER to replace sensitive data with placeholders ([EMAIL_1]) before forwarding to the LLM.
Re-hydration: When the LLM responds, the Worker puts the real data back into the response, so the user context remains broken.
It's open-source (MIT). I'm currently looking for feedback on how to implement semantic caching (Vectorize) to catch non-identical prompts.
Happy to answer any questions about the implementation!"