> We are far further from any understanding of how these models work internally than in the early days of fission
OMG. I'm like really dont want to be offensive or something, but everyone always knew "HOW" these models work exactly. Its easy enough principle to explain to 10 years old if you take something like Karpathy article on MicroGPT:https://karpathy.github.io/2026/02/12/microgpt/
None of SOTA LLMs are any different - they just much much larger and have a lot of optimizations.
Fact that LLM companies trying to sell it as some kind of magic is just proof how much lies is here.
All it does is just predict next "word" at any given time.
> and, if this was actually creating a truly intelligent, autonomous entity, alignment seems unsolvable as well, at least the way it is proposed.
This is obviously true. It's very hard to predict whatever you gonna decompress from a lossely "compressed" dataset using floating point math.This is why you cant solve it all with pre-training or censorship on top, but instead you need a good sandboxes and harnesses.
Anthropic are putting more effort than most into this and I find their work fascinating in that area, though like with OpenAI, I will maintain that if they truly believed this problem must be solved to stave off major catastrophe, they’d solely focus on interpretability of other labs models, not work on and market their own.