upvote
It cannot really oversee this. If you can decompose a problem into individual steps that are not, in themselves, against the agent's alignment, it's certainly possible to have the aggregate do so.
reply
How confident are we, with OpenAI's recent very large contribution to Trump's PAC, that OpenAI wasn't working to get Anthropic designated a supply chain risk behind the scenes? I don't want to be too paranoid here but given Sam's reputation and cui bono I don't think we can really rule this out either.
reply
>(I'm not sure I believe "guardrails" can prevent mass surveillance of civilians?)

Right, wouldn't they need a moderation layer that could, for example, fire if it analyzed & labeled too many banal English conversations?

They really gave training credit for guardtrails? I mean, it could perhaps reject prompts about designing social credit systems sometimes, but I can't imagine realistic mitigations to mass domestic surveillance generally.

reply