upvote
Something in favor of this is the fact that it runs in their cloud and literally tells you that it costs I think $10 to $25 per run
reply
Why would they use their most expensive model when sonnet or opus can do the job as well?
reply
It would be pretty simple to see what API they're calling.
reply
That's what I meant to get at by "it runs on their cloud."

They can name that user-facing ultrareview API endpoint whatever they want, and we have no way to see what model endpoint it calls internally once running on their cloud, right?

reply
Introduce intentional and increasingly subtle vulns and test against Sonnet, Opus, etc? Should give statistical evidence of its power.
reply