upvote
In that case, you don't want to be using Claude Code, which is more of a consumer product; you instead want to control the inference stack yourself. What you are looking for is structured output (you give the inference engine a JSON schema you define that the response must conform to) + a JSON schema validator that parses the output, checks if the response is valid JSON. If it is, you're good to go, if not, run the inference again. llama.cpp supports structured outputs, as do some more consumer-oriented tools that wrap like LM Studio. If you don't want to buy hardware yourself or pay exorbitant cloud rental prices, p2p GPU rental marketplaces like vast.ai can offer much more economical options.
reply