undefined

points

[-]

Isn't that over-simplifying it a bit too much?

You can go another step - a FFN can be simulated on a Turing machine, thus it just exemplifies the incredible semantical power of the Turing machine model of computation. (in fact you don't even need a Turing machine, since there is no looping in one forward pass).

In theory you can run a huge FFN on the tiniest Turing machine, in practice it's much better to run a Transformer on the latest NVIDIA hardware. Or as they say "quantity (performance) has a quality all its own"

by musebox357 hours ago|

parent|

[-]

I was about to post your last point / quote. Going multigpu is relatively not so though but once you go multi-node you have distributed storage/io/compute system which is highly non trivial. Add that the long training times now you have robustness/fault-tolerantness concerns with hardware failures and restarts. Today’s training systems are engineering marvels.

by zbendefy7 hours ago|

parent|

prev|

[-]

Good point!

There is also the case for Markov chains being theoretically able to do these if tuned well. Or even SAT problem.

by CGMthrowaway4 hours ago|

parent|

[-]

"LLM is just fancy autocomplete"

by slickytail9 hours ago|

prev|

[-]

[dead]