undefined

points

[-]

I was about to post your last point / quote. Going multigpu is relatively not so though but once you go multi-node you have distributed storage/io/compute system which is highly non trivial. Add that the long training times now you have robustness/fault-tolerantness concerns with hardware failures and restarts. Today’s training systems are engineering marvels.

by zbendefy7 hours ago|

prev|

[-]

Good point!

There is also the case for Markov chains being theoretically able to do these if tuned well. Or even SAT problem.

by CGMthrowaway4 hours ago|

parent|

[-]

"LLM is just fancy autocomplete"