upvote
Anything below one billion parameters you can run on the CPU at acceptable speed

For larger sizes you still can, it just becomes slower and slower. For a simple classification task (small input, tiny output, and you can constrain output to a couple tokens) you could even run something like a 4B or 8B model on the CPU

reply
I guess that technically depends on the software used to run the model, but in general it's always been possible to run on a CPU (and may even be possible to run on TPU or something else). It's just been slower. Likewise GPU RAM vs system RAM and the bandwidths involved can make hard bottlenecks.

GPU and VRAM (or fast unified RAM) is generally the option that is both available and performant, but especially really small models also run quite well on CPU and system RAM.

reply
iGPUs are often slower or only as fast as CPUs when it comes to LLM text generation.

The advantage is mainly in memory bandwidth. External GPUs' internal memory is slightly faster than DDR attached to your CPU.

Other types of "AI" models do make use of the extra compute in GPUs but not LLMs.

reply