Second answer: ask an AI, but prices have risen dramatically since their training cutoff, so be sure to get them to check current prices.
Third answer: I'm not an expert by a long shot, but I like building my own PCs. If I were to upgrade, I would buy one of these:
Framework desktop with 128gb for $3k or mainboard-only for $2700 (could just swap it into my gaming PC.) Or any other Strix Halo (ryzen AI 385 and above) mini PC with 64/96/128gb; more is better of course. Most integrated GPUs are constrained by memory bandwidth. Strix Halo has a wider memory bus and so it's a good way to get lots of high-bandwidth shared system/video RAM for relatively cheap. 380=40%; 385=80%; 395=100% GPU power.
I was also considering doing a much hackier build with 2x Tesla P100s (16gb HBM2 each for about $90 each) in a precision 5820 (cheap with lots of space and power for GPUs.) Total about $500 for 32gb HBM2+32gb system RAM but it's all 10-year-old used parts, need to DIY fan setup for the GPUs, and software support is very spotty. Definitely a tinker project; here there be dragons.
I run qwen 122b with Claude code and nanoclaw, it's pretty decent but this stuff is nowhere prime time ready, but super fun to tinker with. I have to keep updating drivers and see speed increases and stability being worked on. I can even run much larger models with llama.cpp (--fit on) like qwen 397b and I suppose any larger model like GLM, it's slow but smart.
For a hobby/enthusiast product, and even for some useful local tasks, MoE models run fine on gaming PCs or even older midrange PCs. For dedicated AI hardware I was thinking of Strix Halo - with 128gb is currently $2-3k. None of this will replace a Claude subscription.
1) What are you going to use that for? 0.6 model gives you what you could get from Siri when it first launched at most unless you do some tunning.
2) Pretty clear that they are talking about GLM-5.1 4-bit quant.