undefined

points

[-]

It cannot really oversee this. If you can decompose a problem into individual steps that are not, in themselves, against the agent's alignment, it's certainly possible to have the aggregate do so.

by Symmetry12 hours ago|

prev|

[-]

How confident are we, with OpenAI's recent very large contribution to Trump's PAC, that OpenAI wasn't working to get Anthropic designated a supply chain risk behind the scenes? I don't want to be too paranoid here but given Sam's reputation and cui bono I don't think we can really rule this out either.

by Barbing15 hours ago|

prev|

[-]

>(I'm not sure I believe "guardrails" can prevent mass surveillance of civilians?)

Right, wouldn't they need a moderation layer that could, for example, fire if it analyzed & labeled too many banal English conversations?

They really gave training credit for guardtrails? I mean, it could perhaps reject prompts about designing social credit systems sometimes, but I can't imagine realistic mitigations to mass domestic surveillance generally.