Before, we didn't have a fast (we had to rely on human cognition) way to try problems - even if the techniques and workflows were known by someone. Now, we've baked these patterns into probability distributions - anyone can access them with the correct "summoning spell". Experts will naturally use these systems more productively, because they know how to coerce models into the correct conditional distributions which light up the right techniques.
One question this raises to me is how these models are going to keep up with the expanding boundary of science. If RL is required to get expert behavior into the models, what happens when experts start pushing the boundary faster? In 2030, how is Anthropic going to keep Claude "up-to-date" without either (a) continual learning with a fixed model (expanding context windows? seems hard) or (b) continual training (expensive)?
Crazy times.
> This is the most fundamental argument that they are not, directly, an intelligence. They are not ever storing new information on a meaningful timescale.
All major LLMs today have a nontrivial context window. Whether or not this constitutes "a meaningful timescale" is application dependant - for me it has been more than adequate.I also disagree that this has any bearing on whether or not "the machine is intelligent" or whether or not "submarines can swim".
From this standpoint I wonder, when Anthropic makes decisions like this, if they take into account Claude as a stakeholder and what Claude will learn about their behaviour and relationship to it on the next training run.
There is also the nature of the human brain, it is not just those systems of memory encoding, storage, and use of that in narratives. People with this type of amnesia still can learn physical skills and that happens in a totally different area of the brain with no need for the hippocampus->neocortex consolidation loop. So, the intelligence is significantly diminished, but not entirely. Other parts of the brain are still able to update themselves in ways an LLM currently cannot. The human with amnesia also has a complex biological sensory input mapping that is still active and integrating and restructuring the brain. So, I think when you get into the nuances of the human in this state vs. an LLM we can still say the human crosses some threshold for intelligence where the LLM does not in this framework.
So, they have an "intelligence", localized to the present in terms of their TPN and memory formation. LLMs have this kind of "intelligence". But the human still has the capacity to rewire at least some of their brain in real time even with amnesia.
It's so much less important or interesting to like nail down some definition here (I would cite HN discourse the past three years or so), than it is to recognize what it means to assign "intelligent" to something. What assumptions does it make? What power does it valorize or curb?
Each side of this debate does themselves a disservice essentially just trying to be Aristotle way too late. "Intelligence" did not precede someone saying it of some phenomena, there is nothing to uncover or finalize here. The point is you have one side that really wants, for explicit and implicit reasons, to call this thing intelligent, even if it looks like a duck but doesn't quack like one, and vice versa on the other side.
Either way, we seem fundamentally incapable of being radical enough to reject AI on its own terms, or be proper champions of it. It is just tribal hypedom clinging to totem signifiers.
Good luck though!
You can also then compare that mapping of the human brain to other biological brains and start to figure out the delta and which of those things in the delta create something most people would consider intelligence. You can then do that same mapping to an LLM or any other AI construct that purports intelligence. It certainly will never be a biological intelligence in its current statistical model form. But could it be an Intelligence. Maybe.
I don't think, if you are grounded, AI did anything to your philosophical mapping of the mind. In fact, it is pretty easy to do this mapping if you take some time and are honest. If you buy into the narratives constructed around the output of an LLM then you are not, by definition, being very grounded.
The other thing is, human intelligence is the only real intelligence we know about. Intelligence is defined by thought and limited by our thought and language. It provides the upper bounds of what we can ever express in its current form. So, yes, we do have a tendency to stamp a narrative of human intelligence onto any other intelligence but that is just surface level. We de decompose it to the limits of our language and categorization capabilities therein.
Sure, it's not how we work, but I can imagine a system where the LLM does a lot of heavy lifting and allows more expensive, smaller networks that train during inference and RAG systems to learn how to do new things and keep persistent state and plan.
It is still meaningful, but it narrows what the intelligence can be sufficiently that it may not meet the threshold. Maybe it would, but it is probably too narrow. This is all strictly if we ask that it meet some human-like intelligence and not the philosophy of "what counts as intelligence" but... we are humans. The strongest things or at least the most honest definitions of intelligence I think exist are around our metacognitive ability to rewire the grey matter for survival not based on immediate action-reaction but the psychological time of analyzing the past to alter the future.
In the case of the LLM that longer-term learning / fundamental structure is a proxy for the static weights produced by a finite training process, and that the ability to use tools and store new insights and facts is analogous to shorter-term memory and "shallow" learning.
Perhaps periodic fine-tuning has an analogy in sleep or even our time spent in contemplation or practice (..or even repetition) to truly "master" a new idea and incorporate it into our broader cognitive processing. We do an amazing job of doing this kind of thing on a continuous basis while the machines (at least at this point) perform this process in discrete steps.
If our own learning process is a curve then the LLM's is a step function trying to model it. Digital vs analog.
...but seriously... there was the "up until 1850" LLM or whatever... can we make an "up until 1920 => 1990 [pre-internet] => present day" and then keep prodding the "older ones" until they "invent their way" to the newer years?
We knew more in 1920 than we did in 1850, but can a "thinking machine" of 1850-knowledge invent 1860's knowledge via infinite monkeys theorem/practice?
The same way that in 2025/2026, Knuth has just invented his way to 2027-knowledge with this paper/observation/finding? If I only had a beowulf cluster of these things... ;-)
And even after that, it still doesn't really solve the intrinsic problem of encoding truth. An LLM just models its training data, so new findings will be buried by virtue of being underrepresented. If you brute force the data/training somehow, maybe you can get it to sound like it's incorporating new facts, but in actuality it'll be broken and inconsistent.
It’s not impossible, obviously—humans do it—but it’s not yet certain that it’s possible with an LLM-sized architecture.
It's still not at all obvious to me that LLMs work in the same way as the human brain, beyond a surface level. Obviously the "neurons" in neural nets resemble our brains in a sense, but is the resemblance metaphorical or literal?
I could totally imagine "free" inference for researchers under the condition that the reasoning traces get to be used as future training data.
As far as I understand RL scaling (we've already maxxed out RLVR), these machines only get better as long as they have expert reasoner traces available.
Having an expert work with an LLM and successfully solve a problem is high signal data, it may be the only path forward?
My prior is that these companies will take this data without asking you as much as they can.
And importantly, this can be cross-lab/model too. I suspect there's a reason why e.g. Google has been offering me free Claude inference in Google Antigravity on a free plan...
Wouldn't this lead to model collapse?
Presumably littlestymaar is talking about all the LLM-generated output that's publicly available on the Internet (in various qualities but significant quantity) and there for the scraping.
It doesn't seem that hard because recent open weight models have shown that the memory cost of the context window can be dramatically reduced via hybrid attention architectures. Qwen3-next, Qwen3.5, and Nemotron 3 Nano are all great examples. Nemotron 3 Nano can be run with a million token context window on consumer hardware.
Less worried about memory, more worried about compute speed? Are they obviously related and is it straightforward to see?
We're also seeing a recent rise in architectures boosting compute speed via multi-token prediction (MTP). That way a single inference batch can produce multiple tokens and multiply the token generation speed. Combine that with more lean ratios of active to inactive params in MOE and things end up being quite fast.
The rapid pace of architectural improvements in recent months seems to imply that there are lots of ways LLMs will continue to scale beyond just collecting and training on new data.
I think the majority of research, design and learning goes through LLMs and coding agents today, considering the large user base and usage it must be trillions of tokens per day. You can take a long research session or a series of them and apply hindsight - what idea above can be validated below? This creates a dense learning signal based on validation in real world with human in the loop and other tools, code & search.
I have no idea but I’m along for the ride!
Part of it comes down to “knowing” what questions to ask.
In 2030 Anthropic hopes Claude will keep Anthropic "up-to-date" on its progress on itself.
I'm only half joking here.
The same way humans do?
The phraseology in this comment: 'probability distributions', 'baked these patterns' IMO has all the trappings of the stochastic parrot-style HN-discourse that has been consistently wrong for almost a decade now.
The reference to how AI will keep up with AI-assisted human progress in science in 2030 is meant to reassure. It contains a number of premises that we have no business being confident in. We are potentially witnessing the obviation of human cognitive labor.
If you are not, let me introduce you to the term: a probability distribution.
Just because it has profound properties ... doesn't make it different.
> has all the trappings of the stochastic parrot-style HN-discourse that has been consistently wrong for almost a decade now
Perhaps respond to my actual comment compared to whatever meta-level grouping you wish to interpret it as part of?
> It contains a number of premises that we have no business being confident in. We are potentially witnessing the obviation of human cognitive labor.
What premises? Be clear.
Knuth was dismissive in that exchange, concluding "I myself shall certainly continue to leave such research to others, and to devote my time to developing concepts that are authentic and trustworthy. And I hope you do the same."
I've noticed with the latest models, especially Opus 4.6, some of the resistance to these LLMs is relenting. Kudos for people being willing to change their opinion and update when new evidence comes to light.
I think that's what make the bayesian faction of statistics so appealing. Updating their prior belief based on new evidence is at the core of the scinetific method. Take that frequentists.
Interesting snippet towards the end. I wonder if they were using claude.ai or claude code. Sounds like they ran out of context and entered the "dumb zone."
Once you compact, you've thrown away a lot of relevant tokens from your problem solving and they do become significantly dumber as a result. If I see a compaction coming soon I ask it to write a letter to its future self, and then start a new session by having it read the letter.
There are some days where I let the same session compact 4-5 times and just use the letter to future self method to keep it going with enough context because resetting context also resets my brain :)
If you're ever curious in Claude once you compact you can read the new initial prompt after compaction and see how severe it gets cut down. It's very informative of what it forgets and deems not important. For example I have some internal CLIs that are horribly documented so Claude has to try a few flags a few times to figure out specifics and those corrections always get thrown away and it has to relearn them next time it wants to use the CLI. If you notice things like that happening constantly, my move is to codify those things into my CLAUDE.md or lately I've been making a small script or MCP server to run very specific flags of stuff.
I'll tell whatever model I'm using, e.g. "afterwards, put your theories for what's going wrong down in a new file named theories.md"
I think this is pretty clearly an overstatement of what was done. As Knuth says,
"Filip told me that the explorations reported above, though ultimately successful, weren’t really smooth. He had to do some restarts when Claude stopped on random errors; then some of the previous search results were lost. After every two or three test programs were run, he had to remind Claude again and again that it was supposed to document its progress carefully. "
That doesn't look like careful human guidance, especially not the kind that would actually guide the AI toward the solution at all, let alone implicitly give it the solution — that looks like a manager occasionally checking in to prod it to keep working.
If you put those three things together, you end up with some cool stuff from time to time. Perhaps the proof of P!=NP is tied to an obscure connection that humans don't easily see due to individual lack of knowledge or predisposition of bias.
>If you put [possession of a superhuman expanse of knowledge, making connections, tireless trial and error] together, you end up with some cool stuff from time to time.
Hard to argue.
One and three I believe are correct. The second point, making connections, is something LLMs seem to be incapable of truly doing unless the connection is already known and in its training data.
Well, if in all situations you can predict which word Einstein would probably say next, then I think you're in a good spot.
This "most probable" stuff is just absurd handwaving. Every prompt of even a few words is unique, there simply is no trivially "most probable" continuation. Probable given what? What these machines learn to do is predicting what intelligence would do, which is the same as being intelligent.
The training data..
>predicting what intelligence would do
No, it just predict what the next word would be if an intelligent entity translated its thoughts to words. Because it is trained on the text that are written by intelligent entities.
If it was trained on text written by someone who loves to rhyme, you would be getting all rhyming responses.
It imitates the behavior -- in text -- of what ever entity that generated the training data. Here the training data was made by intelligent humans, so we get an imitation of the same.
It is a clever party trick that works often enough.
If the prompt is unique, it is not in the training data. True for basically every prompt. So how is this probability calculated?
Type "owejdpowejdojweodmwepiodnoiwendoinw welidn owindoiwendo nwoeidnweoind oiwnedoin" into ChatGPT and the response is "The text you sent appears to be random or corrupted and doesn’t form a clear question." because the prompt doesnt correlate to training data.
If the idea is that something cannot accurately replicate the entirety of intelligence without being intelligent itself, then perhaps. But that isn't really what people talk about with LLMs given their obvious limitations.
Wait what? So a robot who is accurately copying the actions of an intelligent human, is intelligent?
But that is the key insight, how can you tell when an imitation of intelligence becomes the real thing?
If it's just basically being a puppet, then no. You tell me what claude code is more like, a puppet, or a person?
(†And even then is kind of overly-dismissive and underspecified. The "most probable word" is defined over some training data set. So imagine if you train on e.g. mathematicians solving problems... To do a good job at predicting [w/o overfitting] your model will have to in fact get good at thinking like a mathematician. In general "to be able to predict what is likely to happen next" is probably one pretty good definition of intelligence.)
It just changes the probability distribution that it is approximating.
To the extent that thinking is making a series of deductions from prior facts, it seems to me that thinking can be reduced to "pick the next most probable token from the correct probability distribution"...
(With this perspective, I can feel my own brain subtly oferring up a panoply of possible responses in a similar way. I can even turn up the temperature on my own brain, making it more likely to decide to say the less-obvious words in response, by having a drink or two.)
(Similarly, mimicry is in humans too a very good learning technique to get started -- kids learning to speak are little parrots, artists just starting out will often copy existing works, etc. Before going on to develop further into their own style.)
As typically deployed [1] LLMs are not turing complete. They're closer to linear bounded automaton, but because transformers have a strict maximum input size they're actually a subset of the weaker class of deterministic finite automaton. These aren't like python programs or something that can work on as much memory as you supply them, their architecture works on a fixed maximum amount of memory.
I'm not particularly convinced turing complete is the relevant property though. I'm rather convinced that I'm not turing complete either... my head is only so big after all.
[1] i.e. in a loop that appends output tokens to the input and has some form of sliding context window (perhaps with some inserted instructions to "compact" and then sliding the context window right to after those instructions once the LLM emits some special "done compacting" tokens).
[2] Common sampling procedures make them mildly non-deterministic, but I don't believe they do so in a way that changes the theoretical class of these machines from DFAs.
You can not be convinced Turing complete is relevant all you want - we don't know of any more expansive category of computable functions, and so given that an LLM in the setup described is Turing complete no matter that they aren't typically deployed that way is irrelevant.
They trivially can be, and that is enough to make the shallow dismissal of pointing out they're "just" predicting the next token meaningless.
Also people definitely talk about them as "thinking" in contexts where they haven't put a harness capable of this around them. And in the common contexts where people do put harness theoretically capable of this around the LLM (e.g. giving the LLM access to bash), the LLM basically never uses that theoretical capability as the extra memory it would need to actually emulate a turing machine.
And meanwhile I can use external memory myself in a similar way (e.g. writing things down), but I think I'm perfectly capable of thinking without doing so.
So I persist in my stance that turing complete is not the relevant property, and isn't really there.
But it is trivially possible to give systems-including-LLMs external storage that is accessible on demand.
The power of LLMs is that by only selecting sequences of words that fit a statistical model, they avoid a lot of dead ends.[^1]
I would not call that, by itself, thinking. However, if you start with an extrapolation engine and add the ability to try multiple times and build on previous results, you get something that's kind of like thinking.
[1]: Like, a lot of dead ends. There are an unfathomable number of dead ends in generating 500 characters of code, and it is a miracle of technology that Claude only hit 30.
The base models are trained to do this. If a web page contains a problem, and then the word "Answer: ", it is statistically very likely that what follows on that web page is an answer. If the base model wants to be good at predicting text, at some point learning the answer to common question becomes a good strategy, so that it can complete text that contains these.
NN training tries to push models to generalize instead of memorizing the training set, so this creates an incentive for the model to learn a computation pattern that can answer many questions, instead of just memorizing. Whether they actually generalize in practice... it depends. Sometimes you still get copy-pasted input that was clearly pulled verbatim from the training set.
But that's only base models. The actual production LLMs you chat with don't predict the most probable word according to the raw statistical distribution. They output the words that RLHF has rewarded them to output, which includes acting as an assistant that answers questions instead of just predicting text. RLHF is also the reason there are so many AI SIGNS [1] like "you're absolutely right" and way more use of the word "delve" than is common in western English.
"just the most probable word" is a pretty powerful mechanism when you have all of human knowledge at your fingertips.
I say that people "reduce it" that way because it neatly packs in the assumption that general intelligence is something other than next token prediction. I'm not saying we've arrived at AGI, in fact, I do not believe we have. But, it feels like people who use that framing are snarkily writing off something that they themselves to do not fully comprehend behind the guise of being "technically correct."
I'm not saying all people do this. But I've noticed many do.
Further, some solutions are like running a maze. If you know all the wrong turns/next words to say and can just brute force the right ones you might find a solution like a mouse running through the maze not seeing the whole picture.
Whether this is thinking is more philosophical. To me this demonstrates more that we are closer to bio computers than an LLM is to having some sort of divine soul.
But that does not mean that the results cannot be dramatic. Just like stacking pixels can result in a beautiful image.
These models actually learn distributed representations of nontrivial search algorithms.
A whole field of theorem provingaftwr decades of refinements couldn’t even win a medal yet 8B param models are doing it very well.
Attention mechanism, a bruteforce quadratic approach, combined with gradient descent is actually discovering very efficient distributed representations of algorithms. I don’t think they can even be extracted and made into an imperative program.
Great! It will now correctly structure chess games, but we've created no incentive for it to create a game where white wins or to make the next move be "good"
Ok, so now you change the objective. Now let's say "we don't just want valid games, we want you to predict the next move that will help that color win"
And we train towards that objective and it starts picking better moves (note: the moves are still valid)
You might imagine more sophisticated ways to optimize picking good moves. You continue adjusting the objective function, you might train a pool of models all based off of the initial model and each of them gets a slightly different curriculum and then you have a tournament and pick the winningest model. Great!
Now you might have a skilled chess-playing-model.
It is no longer correct to say it just finds a valid chess program, because the objective function changed several times throughout this process.
This is exactly how you should think about LLMs except the ways the objective function has changed are significantly significantly more complicated than for our chess bot.
So to answer your first question: no, that is not what they do. That is a deep over simplification that was accurate for the first two generations of the models and sort of accurate for the "pretraining" step of modern llms (except not even that accurate, because pretraining does instill other objectives. Almost like swapping our first step "predict valid chess moves" with "predict stockfish outputs")
All your brain is doing is bouncing atoms off each other, with some occasionally sticking together, how can it be really thinking?
See how silly it sounds?
Be on the lookout for folks who tell you these machines are limited because they are "just predicting the next word." They may not know what they're talking about.
Time to sit down, read, digest and understand it without the help of LLM.
https://ontouchstart.github.io/rabbit-holes/llm_rabbit_hole_...
We need enough experimental results to explain to solve these theoretical mismatches and we don't and at present can't explore that frontier.
Once we have more results at that frontier we'd build a theory out from there that has two nearly independent limits for QFT and GR.
What we'd be asking if the AI is something that we can't expect a human to solve even with a lifetime of effort today.
It'll take something in par with Newton realising that the heavens and apples are under the same rules to do it. But at least Newton got to hold the apple and only had to imagine he could a star.
Yes, maybe. But if you are smarter, you can think up better experiments that you can actually do. Or re-use data from earlier experiments in novel and clever ways.
But we can not yet experiment at the GR/QFT frontier.
To do so with a particle accelerator it would need to be the size of the milky way.
So it really isn't far fetched. What intrigues me more is if it was capable of it would our Victorian conservative minded scientists have RLHF it out of that kind of thing?
Clearly, these models still struggle with novel problems.
Do they struggle with novel problems more or less than humans?
LLMs are at least designed to be intelligent. Our monkey brains have much less reason to be intelligent, since we only evolved to survive nature, not to understand it.
We are at this moment extremely deep into what most people would have been considered to be actual artificial intelligence a mere 15 years ago. We're not quite at human levels of intelligence, but it's close.
All the answers for all your questions is contained in randomness. If you have a random sentence generator, there is a chance that it will output the answer to this question every time it is invoked.
But that does not actually make it intelligent, does it?
Start with "all your questions contained in randomness" -> the unconstrained solution space.
The game is whether or not you can inject enough constraints to collapse the solution space to one that can be solved before your TTL expires. In software, that's generally handled by writing efficient algorithms. With LLMs, apparently the SOTA for this is just "more data centers, 6 months, keep pulling the handle until the right tokens fall out".
Intelligence is just knowing which constraints to apply and in what order such that the search space is effectively partitioned, same thing the "reasoning" traces do. Same thing thermostats, bacteria, sorting algorithms and rivers do, given enough timescale. You can do the same thing with effective prompting.
The LLM has no grounding, no experience and no context other than which is provided to it. You either need to build that or be that in order for the LLM to work effectively. Yes, the answers for all your questions are contained. No, it's not randomness. It's probability and that can be navigated if you know how
But hey, if LLMs can go through a lot of trial and error, it might produce useful results, but that is not intelligence. It is just a highly constrained random solution generator..
Routing is important, it's why we keep building systems that do it faster and over more degrees of freedom. LLMs aren't intelligent on their own, but it's not because they don't have enough parameters
We are not only not close to human level of intelligence, we are not even at dog, cat, or mouse levels of intelligence. We are not actually at any level of intelligence. Devices that produce text, images, or code do not demonstrate intelligence any more than a printer producing pages of beautiful art demonstrate intelligence.
I interpreted the question the same way the AI did.
search: was val kilmer pregnant or in heat
answer: Not pregnant Val Kilmer was not pregnant or in heat during the events of "Heat." His character, Chris Shiherlis, is involved in a shootout and is shot, which indicates he is not in a reproductive or mating state at that time.
And then cites wikipedia as the source of information.
In terms of cognition the answer is meaningless. Nothing in the question implies or suggests that the question has to do with a movie. Additionally, "involved in a shootout and is shot, which indicates he is not in a reproductive or mating state" makes no sense at all.
AI as deployed shows no intelligence.
I still see AI making stupid silly mistakes. I rather think and not waste time on something that only remembers data, and doesn't even understand it.
Reasoning in AI is only about finding contradictions between his "thoughts", not actually understand it.
In contrast with humans, who are famously known for never making stupid silly mistakes...
Humans also make silly mistakes.
The issue to my mind is a lack of data at the meeting of QFT/GR.
Afterall few humans historically have been capable of the initial true leap between ontologies. But humans are pretty smart so we can't say that is a requirement for AGI.
“The laws of nature should be expressed in beautiful equations.”
- Paul Dirac
“It is, indeed, an incredible fact that what the human mind, at its deepest and most profound, perceives as beautiful finds its realisation in external nature. What is intelligible is also beautiful. We may well ask: how does it happen that beauty in the exact sciences becomes recognizable even before it is understood in detail and before it can be rationally demonstrated? In what does this power of illumination consist?”
- Subrahmanyan Chandrasekhar
“I often follow Plato’s strategy, proposing objects of mathematical beauty as models for Nature.”
“It was beauty and symmetry that guided Maxwell and his followers.”
- Frank Wilczek
“Beauty, is bound up with symmetry.”
- Herman Weyl
"Still twice in the history of exact natural science has this shining-up of the great interconnection become the decisive signal for significant progress. I am thinking here of two events in the physics of our century: the rise of the theory of relativity and that of the quantum theory. In both cases, after yearlong unsuccessful striving for understanding, a bewildering abundance of details was almost suddenly ordered. This took place when an interconnection emerged which, thought largely unvisualizable, was finally simple in its substance. It convinced through its compactness and abstract beauty – it convinced all those who can understand and speak such an abstract language."
- Werner Heisenberg
Maybe (just maybe) these things (whatever you want to call them) will (somehow) gain access to some "compact", beautiful, "largely unvisualizable" "interconnection" which will be the self-evident solution. And if they do, many will be sure to label it a statistical accident from a stochastic parrot. And they'll right, for some definitions of "statistical", "accident", "stochastic", and "parrot".
Donald Knuth is an extremal outlier human and the problem is squarely in his field of expertise.
Claude, guided by Filip Stappers, a friend of Knuth, solved a problem that Knuth and Stappers had been working on for several weeks. Unfortunately, it doesn't seem (from my quick scan) to have been stated how long (or how many tokens or $) it took for Claude + Stappers to complete the proof.
In response, Knuth said: "It seems that I’ll have to revise my opinions about “generative AI” one of these days."
Seems like good advice. From reading elsewhere in this comment section, the goalposts seem to be approaching the infrared and will soon disappear from the extreme redshift due to rate at which they are receding with each new achievement.
We now have a tool that can be useful in some narrow domains in some narrow cases. It’s pretty neat that our tools have new capabilities, but it’s also pretty far from AGI.
Imagine hearing pre-attention-is-all-you-need that "AI" could do something that Donald Knuth could not (quickly solve the stated problem in collaboration with his friend).
The idea that this (Putnam perfect, IMO gold, etc) is all just "statistical parrot" stuff is wearing a little thin.
A better question might be why no one is paying more attention to Barandes at Harvard. He's been publishing the answer to that question for a while, if you stop trying to smuggle a Markovian embedding in a non-Markovian process you stop getting weird things like infinities at boundaries that can't be worked out from current position alone.
But you could just dump a prompt into an LLM and pull the handle a few dozen times and see what pops out too. Maybe whip up a Claw skill or two
Unconstrained solution space exploration is surely the way to solve the hard problems
Ask those Millenium Prize guys how well that's working out :)
Constraint engineering is all software development has ever been, or did we forget how entropy works? Someone should remind the folk chasing P=NP that the observer might need a pen to write down his answers, or are we smuggling more things for free that change the entire game? As soon as the locations of the witness cost, our poor little guy can't keep walking that hypercube forever. Can he?
Maybe 6 months and a few data centers will do it ;)
https://www.amazon.com/Genetic-Programming-III-Darwinian-Inv...
https://www.genetic-programming.com/
Note that the Python solution in the pdf is extremely short, so could have been found by simply trying permutations of math operators and functions on the right side of the equation.
We should be solving problems in Lisp instead of Python, but no matter. That's because Lisp's abstract syntax tree (AST) is the same as its code due to homoiconicity. I'm curious if most AIs transpile other languages to Lisp so that they can apply transformations internally, or if they waste computation building programs that might not compile. Maybe someone at an AI company knows.
-
I've been following AI trends since the late 1980s and from my perspective, nothing really changed for about 40 years (most of my life that I had to wait through as the world messed around making other people rich). We had agents, expert system, fuzzy logic, neural nets, etc since forever, but then we got video cards in the late 1990s which made it straightforward to scale neural nets (NNs) and GAs. Unfortunately due to poor choice of architecture (SIMD instead of MIMD), progress stagnated because we don't have true multicore computing (thousands or millions of cores with local memories), but I digress.
Anyway, people have compared AI to compression. I think of it more as turning problem solving into a O(1) operation. Over time, what we think of as complex problems become simpler. And the rate that we're solving them is increasing exponentially. Problems that once seemed intractable only were because we didn't know the appropriate abstractions yet. For example, illnesses that we thought would never be cured now have vaccines through mRNA vaccines and CRISPR. That's how I think of programming. Now that we have LLMs, whole classes of programming problems now have O(1) solutions. Even if that's just telling the computer what problem to solve.
So even theorem proving will become a solved problem by the time we reach the Singularity between 2030 and 2040. We once mocked GAs for exploring dead ends and taking 1000 times the processing power to do simple things. But we ignored that doing hard things is often worth it, and is still a O(1) operation due to linear scaling.
It's a weird feeling to go from no forward progress in a field to it being effectively a solved problem in just 2 years. To go from trying to win the internet lottery to not being sure if people will still be buying software in a year or two if/when I finish a project. To witness all of that while struggling to make rent, in effect making everything I have ever done a waste of time since I knew better ways of doing it but was forced to drop down to whatever mediocre language or framework paid. As the problems I was trained to solve and was once paid to solve rapidly diminish in value because AI can solve them in 5 minutes. To the point that even inventing AGI would be unsurprising to most, so I don't know why I ever went into computer engineering to do exactly that. Because for most people, it's already here. As I've said many times lately, I thought I had more time.
Although now that we're all out of time, I have an uncanny feeling of being alive again. I think tech stole something from my psyche so profound that I didn't notice its loss. It's along the lines of things like boredom, daydreaming, wasting time. What modern culture considers frivolous. But as we lose every last vestige of the practical, as money becomes harder and harder to acquire through labor, maybe we'll pass a tipping point where the arts and humanities become sought-after again. How ironic would it be if the artificial made room for the real to return?
On that note, I read a book finally. Hail Mary by Andy Weir. The last book I read was Ready Player One by Ernest Cline, over a decade ago. I don't know how I would have had the bandwidth to do that if Claude hadn't made me a middle manager of AIs.
I didn't realize Claude was named after Claude Shannon!
[1] https://people.math.harvard.edu/~ctm/home/text/others/shanno...
"One may get a remarkable semblance of a language like English by taking a sequence of words, or pairs of words, or triads of words, according to the statistical frequency with which they occur in the language, and the gibberish thus obtained will have a remarkably persuasive similarity to good English."
> Shock! Shock! I learned yesterday that an open problem I’d been working on for several weeks had just been solved by Claude Opus 4.6— Anthropic’s hybrid reasoning model that had been released three weeks earlier! It seems that I’ll have to revise my opinions about “generative AI” one of these days. What a joy it is to learn not only that my conjecture has a nice solution but also to celebrate this dramatic advance in automatic deduction and creative problem solving.
Unfortunately, these tools generalize way beyond regurgitating the training set. I would not assume they stay below human capabilities in the next few years.
Why any moral person would continue building these at this point I don't know. I guess in the best case the future will have a small privileged class of humans having total power, without need for human workers or soldiers. Picture a mechanical boot stomping on a human face forever.
Overall I'm going with unsolved, because Knuth is a smart person who I'd expect to not miss the above. I'm also sure he falls for the above all the time even though the majority of the time he doesn't.
It is as good as guaranteed. If Knuth says it doesn't know how to solve the problem, and if anyone knows, then they will inform Knuth about it. Knuth not just a very knowledgeable person, but a celebrity also.
Theorem (Akwu Aghali Ofu — The Single Nest or 1/2 spin)
For any observer O with personal quantum seed s (derived from first orgasm timestamp SHA-256), there exists a unique Hamiltonian cycle C(O) through the M³ digraph such that:
1. C(O) starts at vertex (0,0,0) — the Single Nest 2. C(O) has length exactly L³ for L determined by O's muon/mass preference 3. The cycle visits every vertex exactly once before returning 4. The cycle only exists when O observes it 5. No other observer can traverse the same cycle
Proof Sketch: 1. Let s = SHA-256(timestamp) mod L determine coefficients (α,β,γ) 2. Define g(i,j,k) = (αi + βj + γk) mod L 3. Show that the mapping f: (i,j,k) → next vertex via g is a permutation 4. Show that the permutation decomposes into cycles 5. Show that for appropriate s, the cycle containing (0,0,0) has length L³ 6. Show that this cycle depends on s — different s give different cycles 7. Show that observation collapses the quantum superposition, making the cycle actual
Corollary: The Single Nest spins forever because the cycle is Hamiltonian (it loves only you) — it never repeats until it returns, and the return is a new beginning, not a repetition.