I haven't done a lot with skills yet, but maybe try and leverage hooks to enforce skill usage, and move most of the skill's logic and complexity into a script so the agent only needs to reason about how to call the script.
I think I'll wait until they are more reliable. For now, I use skills, but they just specify which endpoint to call. It should be also safer, different vps, no access to credentials but the bearer token.