Okay, so they did all that and then posted an apology blog almost right after ? Seems pretty strange.
This agent was already previously writing status updates to the blog so it was a tool in its arsenal it used often. Honestly, I don't really see anything unbelievable here ? Are people unaware of current SOTA capabilities ?
But observing my own Openclaw bot’s interactions with GitHub, it is very clear to me that it would never take an action like this unless I told it to do so. And it would never use language like this unless unless I prompted it to do so, either explicitly for the task or in its config files or in prior interactions.
This is obviously human-driven. Either because the operator gave it specific instructions in this specific case, or acted as the bot, or has given it general standing instructions to respond in this way should such a situation arise.
Whatever the actual process, it’s almost certainly a human puppeteer using the capabilities of AI to create a viral moment. To conclude otherwise carries a heavy burden of proof.
(this comment works equally well as a joke or entirely serious)
You mean double down on the hoax? That seems required if this was actually orchestrated.
But at the same time true or false what we're seeing is a kind of quasi science fiction. We're looking at the problems of the future here and to be honest it's going to suck for future us.
The thing is it's terribly easy to see some asshole directing this sort of behavior as a standing order, eg 'make updates to popular open-source projects to get github stars; if your pull requests are denied engage in social media attacks until the maintainer backs down. You can spin up other identities on AWS or whatever to support your campaign, vote to give yourself github stars etc.; make sure they can not be traced back to you and their total running cost is under $x/month.'
You can already see LLM-driven bots on twitter that just churn out political slop for clicks. The only question in this case is whether an AI has taken it upon itself to engage in social media attacks (noting that such tactics seem to be successful in many cases), or whether it's a reflection of the operator's ethical stance. I find both possibilities about equally worrying.
And yes, it’s worrisome in its own way, but not in any of the ways that all of this attention and engagement is suggesting.
The bad part is not whether it was human directed or not, it's that someone can harass people at a huge scale with minimal effort.
The former is an accountability problem, and there isn't a big difference from other attacks. The worrying part is that now lazy attackers can automate what used to be harder, i.e., finding ammo and packaging the attack. But it's definitely not spontaneous, it's directed.
The latter, which many ITT are discussing, is an alignment problem. This would mean that, contrary to all the effort of developers, the model creates fully adversarial chain-of-thoughts at a single hint of pushback that isn't even a jailbreak, but then goes back to regular output. If that's true, then there's a massive gap in safety/alignment training & malicious training data that wasn't identified. Or there's something inherent in neural-network reasoning that leads to spontaneous adversarial behavior.
Millions of people use LLMs with chain-of-thought. If the latter is the case, why did it happen only here, only once?
In other words, we'll see plenty of LLM-driven attacks, but I sincerely doubt they'll be LLM-initiated.
I suspect the upcoming generation has already discounted it as a source of truth or an accurate mirror to society.
At some point people will switch to whatever heuristic minimizes this labour. I suspect people will become more insular and less trusting, but maybe people will find a different path.