upvote
Show HN: Serverless OpenAI Gateway: PII and Cache on Cloudflare Workers

(github.com)

"OP here. I built this because I noticed two problems scaling my internal RAG tools:

Redundant Costs: Users asking the same questions (or slight variations) were costing me redundant tokens.

Compliance Anxiety: I didn't want PII (names, emails, IDs) hitting OpenAI/DeepSeek servers directly.

I looked for existing gateways but most were heavy Docker containers (requiring a VPS). I wanted something 'Zero DevOps' that could run on the Edge.

The Solution: It's a lightweight Gateway built with Hono running on Cloudflare Workers.

Smart Caching: Hashes the prompt body (SHA-256) and serves from KV if it's a hit (<50ms latency).

PII Shield: Uses Regex/NER to replace sensitive data with placeholders ([EMAIL_1]) before forwarding to the LLM.

Re-hydration: When the LLM responds, the Worker puts the real data back into the response, so the user context remains broken.

It's open-source (MIT). I'm currently looking for feedback on how to implement semantic caching (Vectorize) to catch non-identical prompts.

Happy to answer any questions about the implementation!"

reply