upvote
This isn't really a reasonable approach, is it?

The original prompts aren't provided, nor is the original context; even then, you can't really treat a stochastic system like an LLM as a major component in reproducibility.

reply
> even then, you can't really treat a stochastic system like an LLM as a major component in reproducibility.

If you had the other things, being "stochastic" is not even remotely a show-stopper. Stochastic processes abound and are the reason the mathematics of statistics was developed in the first place, ultimately allowing us to create such things as LLMs.

When all the relevant steps gets published, I absolutely expect a lot of people to (attempt to) reproduce this work even though LLMs are stochastic.

reply
My issue with this is that it's a form of "soft" reproducibility, where it'll work for many (maybe even most!) people, but that depends on the way the original prompt was formulated (read on) and the state of the random noise in the system.

On the prompt formulation; prompts with very similar formulations (in terms of both semantics, hamming distance, or both) can lead to _wildly divergent_ outputs in my experience. It's not rigourous, and when that divergence happens, it's extremely difficult (arguably impossible, by nature of the architecture of transformers) to identify why the divergence happened and where.

reply
Claude code was used to organize the material and to run simulations. The simulations were to determine the likelihood that the text was Semitic vs Tom got lucky. Tom has assigned probabilities to each of the syllables he has proposed sound values for.
reply
Sure it is. We're humans, not robots (well, I think I am, and I presume you are as well, but for all we know, we could be living in a simulation), so if the non-deterministic system decides to generate code that calls the variable foo one day and bar the next, as long as the code still does what's being asked of it, why do I care that the non deterministic system chose to call the variable something different when run on Tuesday? There's the computer science definition of determinism and the engineering result of "does it work", which are at odds. It's like the halting problem. We haven't solved the computer science definition of the halting problem, but give some C code with a loop that won't terminate to Claude, and it'll call that out as not halting.
reply
All things aside, I think this misses the forest for the trees on the halting problem.

It's not about being able to throw claude or codex at a loop and having it evaluate it for halting, it's about being able to do this for arbitrary code. Computer science rigourously defines the halting problem as not computable and undecidable. within the framework of using something akin to static analysis using any deterministic Turing machine.

There's not really a question of "solving" the halting problem like there's some as-yet unknown way of generally figuring out if arbitraty code halts. Turing proposed a proof in 1937 in favour of undecidability of what we now know as the halting problem, building on ideas first articulated by Church a few years prior.

Frankly, if anything, it's reasonable to say that the halting problem's been solved, just in the direction of undecidability rather than decidability.

Anyway, back to LLMs; as code gets more complex, the robot will need a bigger context window, more hardware resources, and more time, all of which will be variable due to the noise inherent in the system. It'll be difficult to put a useful upper and lower bound on how much computing power and time it'll take to figure out if a program ever halts. Which is all a bit moot, frankly, in the context of halting, but useful to keep in mind in the more general context of using these things as analysis tools.

reply
Actually it is because Claude did the work and being a lay person isn’t really that high of a bar.
reply
Claude helped, but did not do the work. This was a human dude who had a very helpful assist from Claude
reply
> stochastic system

Every day when you lower your butt onto your chair, you trust a stochastic system enough to assume you'll rest on the chair safely and not spontaneously phase through, which would lead to rather gory and painful terminal experience.

Physics at macro scale is stochastic, which is a good reminder that stochastic != uniformly random. Expected distributions matter.

reply
While strictly true, QM has such small standard deviations as to be irrelevant on the macro for things like bums and chairs.

IMO a better example would be the stochastic nature of quality control in manufacturing.

reply
somehow I suspect it was a bit more involved than: Claude, please solve Linear A.
reply
A little bit more. If you ask ChatGPT to "solve linear a" it thinks you mean linear algebra. If you specify that it's the Minoan translation problem, you get a table similar to the one that we get a glimpse of in the without access to the paper, we can't say how much more work the paper has than my gist.

https://gist.github.com/fragmede/bbf277d36a2398065f109484f34...

reply
You also have to add "make no mistakes"!
reply
You are correct!
reply
Unless if it was done by Fable!
reply
The 'major insight' described in the article predates Fable's release by two week four days. It would be a complicated timeline.
reply