undefined

points

[-]

I spent the entire time reading it pondering the same thing.

1. The article presents that calling out to a tool like python is "expensive" because of the overhead of forking a process, loading up the python env etc, but why not just eliminate that overhead and embed WebAssembly so this "tool call" is near zero? This feels very similar to the discussion in the 90's around the overhead of threads v.s. processes or kernel space v.s. user space. Could even go further and have a running beam vm so the LLM can write elixir which is ideal for LLM's that stream out code? Elixir programs will be a lot shorter than webassembly.

2. The core argument stated is "A system that cannot compute cannot truly internalize what computation is." The idea being that it could write a program, execute it and by seeing all of the steps maybe even part way through stop and change its mind or when writing new programs write them better, aka be able to debug on the fly?

3. Not mentioned, but there is a 3rd x factor that LLM's will use this new found computation engine to do overall better at "thinking". Computing in very unexpected ways and to unexpected problems. Maybe it would do dramatically better at some benchmark because of this?

Unfortunately these are not explored and it is just an execution engine even resulting in the conclusion stating "arbitrary programs can be compiled directly into the transformer weights, bypassing the need to represent them as token sequences at all." which goes to point number 1 of if we are compiling to weights why not just optimize the tool calling?

by hedgehog2 hours ago|

parent|

[-]

I'm not sure about the rest but a significant problem with high frequency tool calling (especially in training) is that it breaks batching.

by vuciuc5 hours ago|

parent|

prev|

[-]

> "A system that cannot compute cannot truly internalize what computation is."

The way this is formulated, almost sounds like they think that giving llms this ability will bring them closer to having experiences of computation or smth? Weird?

by D-Machine1 hours ago|

parent|

[-]

One of the worst sentences in the article, clear example of pseudo-profound bullshit, almost certainly LLM-generated.

by jadbox4 hours ago|

parent|

prev|

[-]

Maybe this could be used as an optimizing profiler in order to inform the compiler on novel methods for improving hot sections of code?

by Rastonbury11 hours ago|

prev|

[-]

Why must models be analogous to humans using tools? Or to take the analogy route further wouldn't it be better if humans had calculators built into their brains, provided they are determisitic and reduce latency

by MattPalmer10868 hours ago|

parent|

[-]

Because it is directly analogous. Neural nets (whether biological or artificial) are not the best way to execute lots of deterministic computations quickly and reliably. That's why we invented computers.

I'm not convinced at all that this is the best way to reduce latency; there are many other ways of doing that.

Having a calculator in our brains would be handy of course, but a gigahertz multi core computer is still going to be better at anything that needs to do a lot of computation and or a lot of data.

by graemefawcett7 hours ago|

parent|

[-]

Exactly. They've implemented a VM inside a transformer, turned an O(1) memory access call into O(n), optimized it down to O(log n) and wrote a post about how smart they are.

It's a nice bit of engineering, if you don't subscribe to YAGNI. If you do, you must ask the obvious question of what capability this delivers that wasn't available before. The only answer I've got is that someone must have been a bit chilly and couldn't figure out the thermostat