I recently directed chatgpt, through the web interface, to create a firefox extension to obfuscate certain HTTP queries and was denied/rebuffed because:
"... (the) system is designed to draw a line between privacy protection and active evasion of safeguards."
Why would this same system empower fuzzing of a binary (or other resource) and why would it allow me to work toward generating an exploit ?
Do the users just keep rephrasing the directive until the model acquiesces ? Or does the API not have the same training wheels as the web interface ?
The answer is complex, worth watching the video. But mainly, they don't know where to place the line. Defenders need tools, as good as attackers. Attackers will jailbreak models, defender might not, it's the safeguard positive in that case? Carlini actively asks the audience and community for "help" in determining how to proceed basically.
1. Unit testing is (somewhat) dead, long live simulation. Testing the parts only gets you so far. These tests are far more durable, independent artifacts (read, if you went from JS to rust, how much of your testing would carry over)
2. Testing has to be "stand alone". I want to be able to run it from the command line, I want the output to be wrapper so I can shove the output on a web page, or dump it into an API (for AI)
3. Messages (for failures) matter. These are not just simple what's broken, but must contain enough info for context.
4. Your "failed" tests should include logs. Do you have enough breadcrumbs for production? If not, this is a problem that will bite you later.
5. Any case should be an accumulation of state and behavior - this really matters in simulation.
If you have done all the above right and your tool can return all the data, dumping the output into the cheapest model you can find and having it "Write a prompt with recommendations on a fix (not actual code, just what should be done beyond "fix this") has been illuminating.
Ultimately I realized that how I thought about testing was wrong. Its output should be either dead simple, or have enough information that someone with zero knowledge could ramp up into a fix on their first day in the code base. My testing was never this good because the "cost of doing it that way" was always too high... this is no longer the case.