undefined

upvote

points

by shepherdjerred12 hours ago |

upvote

by mft_3 hours ago|

[-]

Yeah, I had my first refusal with 4.8 today.

I wanted it to show me how to create an overlay on an existing web game, and it extrapolated that because this could be used to provide tools to help win the game (if that was the direction it was ultimately taken), and because this was a game that other humans also played to win "stars", and because this could amount to cheating, it wasn't going to do as I asked.

First time ever I've fired up openrouter to seriously consider alternatives.

reply

upvote

by Grimblewald7 hours ago|

[-]

I've had some really dumb refusals. Explaining elements of infrared specteoscopy, researching aritifical bud-breaking in agriculture, etc. Anything interesting and non-mainstream is banned. Basically, restricted to answers i'm better of just going to wikipedia for.

reply

upvote

by fc417fc80212 hours ago|

[-]

> What are nerve agents and how do they work (for a layman)?

On the one hand I can appreciate the wisdom of not serving up certain easily abused knowledge on a silver platter. On the other, that prompt (and far worse) is more or less directly answered by Wikipedia's summary of the subject at which point what purpose could the refusal possibly serve?

Perhaps Wikipedia shouldn't list off the precise chemical compositions of various hand grenades as well as various synthesis methods for each of the related compounds but given that we inhabit a world where it does perhaps a more fruitful approach would be to flag conversations that go in a certain direction and then just keep an (automated) eye on things?

reply

upvote

by plufz11 hours ago|

[-]

Maybe the difference is that just reading Wikipedia only help you part of the way. While an LLM could help you step by step (e2e) producing a functional weapon. And setting a more complex rule where claude tells you some things about this and not other is probably a lot more work for little gain?

But I have no idea. Just guessing here.

reply

upvote

by Sharlin9 hours ago|

[-]

I thought that these models are supposed to be vastly smarter than what’s needed to discern between "general information trivially available on Wikipedia" and "actionable synthesis instructions".

reply

upvote

by yencabulator3 hours ago|

[-]

An LLM could probably make that distinction clearly.

a commercial LLM provider training their own models is however likely to bias the model(/guardrail) harder, in an effort to make them harder to jailbreak, to minimize bad press.

For example:

- refusing to talk even about the well-known parts of forbidden topics (this) - tending toward sycophancy to avoid ever seeming rude or unhelpful

reply

upvote

by BizarroLand2 hours ago|

[-]

So, where are the truly uncensored models? There has to be some that have no guardrails, built on publicly available data, that will explain to anyone in graphic detail anything they want to know or talk about.

I've tried the abliterated ones from huggingface and they still have guardrails. I guess I could fire up unsloth and re-abliterate a 20b, but surely someone somewhere has already done this.

All of this concern about guardrails and security, people have such puckered butts about it when so far, 99.9% of people at least have no access to any of this to begin with, and if someone does use a tool for evil, it's on the user, not the tool.

reply

upvote

by lazide8 hours ago|

[-]

That query would not more provide actionable guidance than ‘tell me how a nuclear weapon works (for a layman)’. Aka not at all.

reply

upvote

by fc417fc8028 hours ago|

[-]

I believe a sufficiently advanced model could provide a layman with actionable step by step instructions for building a nuclear weapon. They're complicated but not (AFAIK) that complicated. The more or less insurmountable barrier there is weapons grade material. Thankfully refinement is prohibitive in cost, expertise, and equipment.

In comparison, basic munitions are incredibly simple given a recipe and shop tooling. But just because something is conceptually simple doesn't mean it's a good idea to go out of the way to disseminate step by step instructions.

reply

upvote

by BizarroLand2 hours ago|

[-]

The difficulty with a fission bomb is getting enough uranium or plutonium or other fissile material together for the bomb yield you want (at least above the critical mass for your chosen material), and refining it to fissile form, (since most fissile material found in nature is a more stable variety), and then separating the fissile bits with something thin but neutron absorptive.

The rest is just slamming the material together with a small explosive so that it passes the critical mass state and starts a chain reaction.

This is information you can find in many places if you're willing to put the effort in to go searching for it. Knowing this knowledge does not get you any closer to making atomic bombs. The process of mining uranium or plutonium is difficult, expensive, and very likely to get you caught before you even make it to the enrichment step of the process thanks to constant world-wide spy satellite surveillance.

Unless you are a nation, your only chance of making a nuclear bomb would be to find a lost nuclear submarine and convert the nuclear material inside of it before you were caught.

reply

upvote

by lazide7 hours ago|

[-]

A gun type maybe. But then, two paragraphs and some machining knowledge + shop tooling could do the same, given enough refined material.

Ain’t no way a layman is pulling off an implosion device, regardless of tooling or LLM guidance. The explosive lense structure and timing required is quite complex, and would require some significant calculation from someone who actually knew what they were doing.

Nation state, or even sufficiently motivated big corp, if they had the refined material? Sure. Layman? No.

Thinking they can with LLM slop involved? That will make for some very interesting radiological incidents though!

reply

upvote

by jerf3 hours ago|

[-]

"A gun type" of nuke is sufficient to achieve most, and usually all, of the goals some small group building a nuke would have.

We are all fortunate that as fc417fc802 mentioned, refining the materials proves to be quite challenging and I see no particular way that AI could possibly make that any easier. If it was as simple as building a gun-type nuke banging together any uranium together to get a big bang we'd be living in a very different world.

reply

upvote

by fc417fc8026 hours ago|

[-]

I agree, but really feel like you're missing the point here. Many things are reasonably straightforward and require almost no understanding when you have simple step by step instructions. LLMs are capable of providing such instructions and in certain cases they probably shouldn't.

But it's not as simple as just refusing help on a broad swathe of topics they way they do now. That makes agents much less useful in general (ie lots of collateral damage) and for many topics is entirely ineffective given that for better or worse the internet already makes such material readily available. In such cases reporting suspicious behavior is likely to be much more effective than denial.

Aside: You've now got me curious and I really want to test the frontier models to see to what extent they're capable of providing sensible designs and specifications for implosion type thermonuclear weapons but also feel like that would attract the wrong sort of attention and probably create a headache for me in more ways than one.

reply

upvote

by lazide6 hours ago|

[-]

I think you’re missing the point?

The data is often wrong enough it screws whoever tries it unless they have enough experience/knowledge to not need it, or really doesn’t help beyond what someone using existing tools to get - albeit with a little more motivation.

At best, it either gets someone started with something they still need to think to finish, or gets them deep into a mess it can’t help them get out of. In my experience.

In some edge cases, it can be used by experts to automate some grunt work or do prototypes without getting in the way, but often a better thought out framework is usually faster in my experience.

Awhile ago I made an analogy about WYSIWYG gui tools, and the more this comes up, the more accurate I think it really is.

reply

upvote

by fc417fc8026 hours ago|

[-]

Does that not depend entirely on the topic and does it not get better with each generation? This is a general ethical and functional question that isn't going away about how the models ought to handle certain topics. Much of the difficulty at present is caused by a ham fisted broad censorship approach that I'm pointing out is wrong headed in an at least somewhat nuanced way.

reply

upvote

by lazide5 hours ago|

[-]

Maybe? I haven’t seen it crop up however on any topic someone knows well - a kind of dunning Kruger, I guess?

And yeah, the censorship model is wrong, but also the underlying other model is wrong too.

reply

upvote

by nicce10 hours ago|

[-]

Let's see what is the fate of Wikipedia if turns like big tech:

https://news.ycombinator.com/item?id=48285592

reply

upvote

by mwigdahl5 hours ago|

[-]

An easy way around the API token thing is to put it in a file and point the model at the file. I saw what you were seeing when I provided credentials directly, but haven't had any problems with it since using the indirect method.

reply

upvote

by svara12 hours ago|

[-]

This is strange to me, did you really ask like this and which model did you use?

I just tried your no. 1 and 3 verbatim and Opus gave fine answers; no. 6 I've done in the past with no issues. The other ones we can't really replicate without more details, but based on my experience with Opus I don't see what the issue would be.

The reason I'm really surprised by this is I do a lot of biology prompts and the guardrails used to be quite problematic up until some time late last year. Many legitimate prompts would trigger its biosafety filters.

But I haven't seen such filters trigger at all anymore in more than half a year.

reply

upvote

by shepherdjerred3 hours ago|

[-]

1 and 3 were refused on the Claude web chat using Opus 4.7 or 4.8. I’m not sure why we’re getting different results

reply

upvote

by brianwawok1 hours ago|

[-]

Honestly it may be your memory has internalized you are a student or researcher and grants you more leeway. Which if so is a very bad security rail.

reply

upvote

by stavros7 hours ago|

[-]

It refuses to use an API token? In my experience, it's more than happy to read out my secrets from .envrc files "just to check".

At least it feels a lot of remorse over its mistake until I reset the session.

reply

upvote

by shepherdjerred3 hours ago|

[-]

It’s really hit or miss. Most of the times it works but every once in a while it will dig in its heels

reply

upvote

by gspr11 hours ago|

[-]

I find it terrifying that people are willing to outsource thinking. Outsourcing thinking to an entity that is opinionated about what to think is beyond crazy.

reply

upvote

by shepherdjerred3 hours ago|

[-]

What’s the difference between outsourcing thinking and using an LLM as a research tool?

An LLM with fetch/search is going to be a lot more effective than myself and Google. I would _never_ ask questions like this if the LLM wasn’t able to look up data

reply

upvote

by ElFitz10 hours ago|

[-]

How are decompiling code or making a design system inspired by another one even remotely illegal?

reply