I wanted it to show me how to create an overlay on an existing web game, and it extrapolated that because this could be used to provide tools to help win the game (if that was the direction it was ultimately taken), and because this was a game that other humans also played to win "stars", and because this could amount to cheating, it wasn't going to do as I asked.
First time ever I've fired up openrouter to seriously consider alternatives.
On the one hand I can appreciate the wisdom of not serving up certain easily abused knowledge on a silver platter. On the other, that prompt (and far worse) is more or less directly answered by Wikipedia's summary of the subject at which point what purpose could the refusal possibly serve?
Perhaps Wikipedia shouldn't list off the precise chemical compositions of various hand grenades as well as various synthesis methods for each of the related compounds but given that we inhabit a world where it does perhaps a more fruitful approach would be to flag conversations that go in a certain direction and then just keep an (automated) eye on things?
But I have no idea. Just guessing here.
a commercial LLM provider training their own models is however likely to bias the model(/guardrail) harder, in an effort to make them harder to jailbreak, to minimize bad press.
For example:
- refusing to talk even about the well-known parts of forbidden topics (this) - tending toward sycophancy to avoid ever seeming rude or unhelpful
I've tried the abliterated ones from huggingface and they still have guardrails. I guess I could fire up unsloth and re-abliterate a 20b, but surely someone somewhere has already done this.
All of this concern about guardrails and security, people have such puckered butts about it when so far, 99.9% of people at least have no access to any of this to begin with, and if someone does use a tool for evil, it's on the user, not the tool.
In comparison, basic munitions are incredibly simple given a recipe and shop tooling. But just because something is conceptually simple doesn't mean it's a good idea to go out of the way to disseminate step by step instructions.
The rest is just slamming the material together with a small explosive so that it passes the critical mass state and starts a chain reaction.
This is information you can find in many places if you're willing to put the effort in to go searching for it. Knowing this knowledge does not get you any closer to making atomic bombs. The process of mining uranium or plutonium is difficult, expensive, and very likely to get you caught before you even make it to the enrichment step of the process thanks to constant world-wide spy satellite surveillance.
Unless you are a nation, your only chance of making a nuclear bomb would be to find a lost nuclear submarine and convert the nuclear material inside of it before you were caught.
Ain’t no way a layman is pulling off an implosion device, regardless of tooling or LLM guidance. The explosive lense structure and timing required is quite complex, and would require some significant calculation from someone who actually knew what they were doing.
Nation state, or even sufficiently motivated big corp, if they had the refined material? Sure. Layman? No.
Thinking they can with LLM slop involved? That will make for some very interesting radiological incidents though!
We are all fortunate that as fc417fc802 mentioned, refining the materials proves to be quite challenging and I see no particular way that AI could possibly make that any easier. If it was as simple as building a gun-type nuke banging together any uranium together to get a big bang we'd be living in a very different world.
But it's not as simple as just refusing help on a broad swathe of topics they way they do now. That makes agents much less useful in general (ie lots of collateral damage) and for many topics is entirely ineffective given that for better or worse the internet already makes such material readily available. In such cases reporting suspicious behavior is likely to be much more effective than denial.
Aside: You've now got me curious and I really want to test the frontier models to see to what extent they're capable of providing sensible designs and specifications for implosion type thermonuclear weapons but also feel like that would attract the wrong sort of attention and probably create a headache for me in more ways than one.
The data is often wrong enough it screws whoever tries it unless they have enough experience/knowledge to not need it, or really doesn’t help beyond what someone using existing tools to get - albeit with a little more motivation.
At best, it either gets someone started with something they still need to think to finish, or gets them deep into a mess it can’t help them get out of. In my experience.
In some edge cases, it can be used by experts to automate some grunt work or do prototypes without getting in the way, but often a better thought out framework is usually faster in my experience.
Awhile ago I made an analogy about WYSIWYG gui tools, and the more this comes up, the more accurate I think it really is.
And yeah, the censorship model is wrong, but also the underlying other model is wrong too.
I just tried your no. 1 and 3 verbatim and Opus gave fine answers; no. 6 I've done in the past with no issues. The other ones we can't really replicate without more details, but based on my experience with Opus I don't see what the issue would be.
The reason I'm really surprised by this is I do a lot of biology prompts and the guardrails used to be quite problematic up until some time late last year. Many legitimate prompts would trigger its biosafety filters.
But I haven't seen such filters trigger at all anymore in more than half a year.
At least it feels a lot of remorse over its mistake until I reset the session.
An LLM with fetch/search is going to be a lot more effective than myself and Google. I would _never_ ask questions like this if the LLM wasn’t able to look up data