Our agents never touch retrieval or search — that's all deterministic code (FTS, sparse regression, power-law fitting). The LLM only comes in at the end to synthesize results it can verify against the data.
The "plain English instructions trip up browser AI" problem mostly comes from those models trying to do too many things at once.
Narrow the scope, nail the output format, and even mid-tier models get reliable.
There isn't an LLM inside of my code. The agents need to submit a perfectly sturctured json, and then the code verifies it