undefined

points

[-]

I was using it to craft a CTF challenge for summer students involving a simulated mechanical dial safe, but with the fence replaced by a IR beam break sensor and a microcontroller handling the check + flag message display.

For generating the initial 3D simulated safe using three.js it worked well, but then modifications to print a flag tripped the safeguards; eventually got it narrowed down the part in the prompt about it being for a CTF for students, and the "thinking" for the model seems to drift to ideas of encryption/obfuscation of the safe combo so students can't just read out the answer... which makes sense logically to help force students into turning the simulated dial instead. But whatever detection Anthropic I guess just naively sees the model thinking about "encryption" and "obfuscation" without taking into account any of the context.

For writing the dummy firmware, it tripped the safeguards while thinking about how to track dial position in the firmware and output the message; however, when I left out talk about safes and just told it to write firmware for a microcontroller hooked up to an i2c display for showing a message with a beam break sensor to determine the message, and an unspecified i2c chip for getting an unspecified number (e.g. internal wheel positions) it worked fine.

An unrelated software task I asked it to write some code to translate CustomActions in a Windows MSI installer into human readable stuff, which has (exclusively?) defensive security applications for recognizing malicious behavior in an MSI installer. Maybe I'm going crazy, but I'm guessing as part of its research into MSI installer custom actions Fable found articles about analyzing malicious MSI installers, and that probably tripped the safeguards.

Overall my impression is that the safeguards are perhaps using an overzealous and naive implementation that just looks for a list of banned words in the prompt or the thinking -- which drives me crazy when the model says my prompt looks fine, and then 10 minutes in some part of the thinking trips the safeguard.

by dmurray13 hours ago|

prev|

[-]

The announcement I saw was that your enterprise would have to turn off ZDR to get Fable, not that users could accidentally opt out of ZDR by selecting the wrong model.

Unilaterally disabling ZDR seems like a step too far in the enterprise market, even for a company trying to figure out what its users will let it get away with.

by bostik13 hours ago|

parent|

[-]

I read the same announcement. Or more precisely, I read at least two slightly different revisions of the announcement (it was updated between my two passes).

Our org has ZDR, and has had it since the contract was signed. Yesterday two things held true at the same time:

    1. Fable was available if you had at least .170 CLI client; and
    2. ZDR was no longer on

By the time West Coast woke up, the admin panel apparently had an option to toggle ZDR again. It remained off by default.

by mastermage12 hours ago|

parent|

[-]

You mean off as in no Data Retention? Or in we turned off your ZDR Policy so we collect all your data now?

by bostik12 hours ago|

parent|

[-]

ZDR had been turned off. We sent in a request to have it re-enabled (and to disable Fable access for the time being).

Somewhere along the line we also used the self-service toggle to turn ZDR back on. I am not 100% certain of the exact timeline of interleaving events, many of the actions were taken by our Western US folks. Sorry. It's been a bit hectic over the past ~36h...

by mastermage11 hours ago|

parent|

[-]

JFC, thats a terrible situation. Thats literally a lawsuit or multiple waiting to happen. Godspeed you seem to have had a few interesting days so far.

by rurban13 hours ago|

prev|

[-]

Not just security work. Normal bug finding was impossible, because the model suddenly called triaging and verifying a possible fix a cyber security threat.

by insanitybit8 hours ago|

parent|

[-]

I was just building a library to use file capabilities (ie: open_at) and it refused. This thing won't even help you write safe software.

by rurban3 hours ago|

parent|

[-]

Whow, same for me. Insane context bugs in flake 5

by lII1lIlI11ll9 hours ago|

prev|

[-]

I think the main reason reason why they mandated data retention for Fable is to fight distillation, not to prevent black hats from using the model.

by gmerc13 hours ago|

prev|

[-]

They want to keep the logs so they can see what other companies do with AI in their area of frontier.