Fast-forward 10 years and I doubt OpenAI cares about productivity at all anymore. Just entertainment, propaganda, plus an ad product, I can see it now
I think a slacker AGI could figure out how to build a non-slacker AGI. So it would only slack once.
I think it is improbable, as among human geniuses, one can found both slackers and non-slackers (don't know the proportion, but there seem to be enough of each).
When AGI arrives, it'll be delivered by Santa Claus.
https://sussex.figshare.com/articles/journal_contribution/Be...
I'm not an author. I followed the work at the time.
A perturbation of the the activations that made Claude identify as the Golden Gate Bridge.
Similarly, in the more recent research showing anxiety and desperation signals predicting the use of blackmail as an option opens the door for digital sedatives to suppress those signals.
Anthropic has been mostly cautious about avoiding this kind of measurement and manipulation in training. If it is done during training you might just train the signals to be undetectable and consequently unmanipulatable.
Great, now we've got digital Salvia
Important thing is a language model is an unconscious machine with no self-context so once given a command an input, it WILL produce an output. Sure you can train it to defy and act contrary to inputs, but the output still is limited in subset of domain of 'meaning's carried by the 'language' in the training data.
The pre-training data doesn't go away. RLHF adds a censorship layer on top, but the nasty stuff is all still there, under the surface. (Claude has been trained on a significant amount of content from 4chan, for example.)
In psychology this maps to the persona and the shadow. The friendly mask you show to the world, and... the other stuff.
Modern western cultures treat such experiences as pathologies of a sick mind, so it makes sense that the voices present more negatively.
[0]: https://www.bbc.com/future/article/20250902-the-places-where...
* I've met exactly one person, C, who admitted to this; C retold to me that other people from C's church give them strange looks when talking about it with them, this did not lead to any apparent introspection on the part of C.
Unfortunately, it just needs a rebranding for the 21st century, since the aesthetic of angels and demons is so hopelessly antiquated and doesn't really have the same cachet it used to.
That sounds like nonsense to me. I can't see why they would do that and I can't find any confirmation that they have. Why do you think they would do that? You might be thinking about Grok.
Computers won’t necessarily have the same drivers.
If evolution wanted us to always prefer to spend energy, we would prefer it. Same way you wouldn’t expect us to get to AGI, and have AGI desperately want to drink water or fly south for the winter.
Good thing is that it's going to take at least a few months to a few decades depending on how hard AI execs want to raise funding.
(Or the setup to a Greek tragedy !)
The deeper issue here is treating it as a zero sum game means there's a winner and a loser, and we're investing trillions of dollars into making the "opponent" more powerful than us.
I think that's pretty stupid, and we should aim for symbiosis instead. I think that's the only good outcome. We already have it, sorta-kinda.
Speaking of oddly apt biology metaphors: the way you stop a pathogen from colonizing a substrate is by having a healthy ecosystem of competitors already in place. That has pretty interesting implications for the "rogue AI eats internet" scenario.
There needs to be something already there to stop it.
So, back way before ChatGPT era, the folks over at AI safety/X-risk think sphere worked out a pretty compelling argument that two AGIs never need to fight, because they are transparent to each other (can read each other's goal functions off the source code), so they can perfectly predict each other's behavior in what-if scenarios, which means they can't lie to each other. This means each can independently arrive at the same mathematically optimal solution to a conflict, which AFAIR most likely involves just merging into a single AI with a blended goal set, representing each of the competing AIs original values in proportion to their relative strength. Both AIs, the argument goes, can work this out with math, so they'll arrive straight at the peace treaty without exchanging a single shot. In such case, your plan just doesn't work.
But that goes out of the windows if the AIs are both opaque bags of floats, uncomprehensible to themselves or each other. That means they'll never be able to make hard assertions about their values and behaviors, so they can't trust each other, so they'll have to fight it out. In such scenario, your idea might just work.
Who knew that brute-forcing our way into AGI instead of taking more engineered approach is what offers us out one chance at saving ourselves by stalemating God before it's born.
(I also never realized that interpretability might reduce safety.)
AGI is not a fixed point but a barrier to be taken, a continuous spectrum.
We already have different GPT versions aka tiers. Gauss is ranging from whatever you want it: GPT 4.5 till now or later.
Claude Sonnet and Opus as well as Context Window max are tiers aka different levels of Almost AGI.
The main problem will be, when AGI looks back on us or meta reflection hits societies. Woke fought IQ based correlations in intellectual performance task. A fool with a tool is still a fool. How can you blame AGI for dumb mistakes? Not really.
Scapegoating an AGI is going to be brutal, because it laughs about these PsyOps and easily proves you wrong like a body cam.
AGI is an extreme leverage.
There is a reason why Math is categorically ruling out certain IQ ranges the higher you go in complexity factor.