undefined

points

[-]

Yes. When certain keywords are matched or topics, there is a warning transparently injected server side appended to the system prompt of the convo that’s miles long. It is injected and reevaluated every tool call.

If you begin a generic reverse engineering task, 30+ tool calls in a row. The moment it sees something it doesn’t like, token burn, single tool calls iteration, “This is a known CTF challenge, I can proceed”, single tool calls iteration, “This is a real CTF challenge, I can proceed”, etc.

It’s heavily neutered now, without changing the model, and you pay for the privilege and don’t notice.

The end result of course being that it both expensive and useless for approved CTF tasks. No one is using Opus for security. If they think it’s working, the harsh reality is they’re not doing security work; they’re just generically finding bugs.

I do this for a job and can demonstrate this plain as day, dump the injected prompt, and notice what it’s doing isn’t security work, it just looks like it. Happy to write a blog about it if you want to know more. Apparently many people think it’s working for them when it absolutely isn’t.

by bombcar16 hours ago|

parent|

[-]

Mythos turns out to be Opus 4.8 in a trenchcoat with guardrails removed.

by satvikpendem13 hours ago|

parent|

[-]

Opus 4.7 and 4.8 are well known to be distilled versions of Mythos unlike 4.6 which is why they are rated so badly by users compared to 4.6.

by Khaine16 hours ago|

parent|

prev|

[-]

I would find a blog post on this really interesting.

by ramblin_prose15 hours ago|

parent|

prev|

[-]

I'd like to read that blog please! Thanks for the insight.

by kay_o17 hours ago|

prev|

[-]

When your session is force ended for "abuse" you get neither the response nor a refund

Security, games (think weapons, PVP, attacking, etc), sometimes even asking it for a security review of some CRUD code it wrote itself

by bombcar16 hours ago|

parent|

[-]

I asked it about a “yellow background cell” in Excel and it spewed a book at me. Then it solved the issue.

by danpalmer17 hours ago|

parent|

prev|

[-]

What a joke. Must make it pretty easy to poison a session, you don't need to persuade the model about anything, just trigger its security controls, ideally after as much context as possible, but before it has generated any useful output.

by kay_o17 hours ago|

parent|

[-]

After all, what is roleplay or games but a jailbreak of guard rails? :]

I've even had it refuse CTFs knowing it is a CTF with blatantly obvious CTF flag, no actual application

by SOLAR_FIELDS17 hours ago|

prev|

[-]

Not directly, as it comes in as a not charged error but the weighted generation path used until you hit the guardrail is basically wasted tokens, so yes, indirectly. If I hit a guardrail and rewind I’ve found the training will still be biased towards guardrailing out if you rewind one turn. Rewinding multiple turns allows steering away from that path, but all of the original token spend down that path is wasted

by acters17 hours ago|

prev|

[-]

Yes tokens used (input and sometimes output) are always charged. You likely get charged for the preloaded system prompt, too.

by gmerc16 hours ago|

prev|

[-]

Of course they are. It's standard SaaS to charge for security features ;)