undefined

points

by SkitterKherpi1 days ago |

comments

by jazzyjackson23 hours ago|

[-]

They’ll sell you a bundle, either a pair or a quartet so you can have 256 or 512GB over a 400GB/s network link

I can’t figure out when it makes sense to pay 10k up front for a quantized Llama 3.1 but it’s an interesting option

by c7b22 hours ago|

parent|

[-]

You could fit a Q4 GLM5.2 in 512GB and still have some space for context (372-475GB for the model): https://unsloth.ai/docs/models/glm-5.2

But yeah, there's a bit of a dearth of models that could fully utilize memory in the 128-256GB bracket at the moment. But things move so fast in this space, I wouldn't base my decision on a generation of models that's just a few months old.

by rnxrx22 hours ago|

parent|

[-]

It depends on what's meant by "fully utilized" but fp8 quants of Nemotron 3 Super, the latest Minimax, Cohere A+ and the Mistral small and (especially) medium variants all sit in that 128-256 category, especially with full context or even moderate concurrency. In fact, in a 192GB environment I work with (Hopper GPUs, fwiw) I was pushed into using 4-bit quants with a couple of those to get the model working with a reasonable context window (..but 256 would have rocked out).

by girvo22 hours ago|

parent|

prev|

[-]

Not Llama 3.1, but Step 3.7 Flash is one of the few new high quality models in this size bracket. DeepSeek v4 Flash too

by SkitterKherpi23 hours ago|

parent|

prev|

[-]

10k is rather a lot yes. For LLMs you can use a lot of tokens with 10k with less hassle without the machine (and also it's not like electricity is free), but for some other things like video models 10k would get burned very fast. I am looking for something more in the 5k range though.

by awesomeusername1 days ago|

prev|

[-]

It's out, I'm daily driving one. It's great

by SkitterKherpi23 hours ago|

parent|

[-]

I assume you have the dgx spark? At this point I am not 100% on the difference other than Linux and Windows. The RTX spark should come around Q4, unless I am mistaken.

by vikingcat23 hours ago|

parent|

prev|

[-]

Are you running a local LLM on it? Did you buy a whole laptop?