So like... $2000+ just for the used GPUs? Plus I assume it's considerably more effort to get it working.
Nah, not really. It is a little annoying in terms of space and power, though. Not every case and motherboard can support cards that big.
edit - after actually reading the tweets (had to use xcancel) and visiting the source git repo, switching to MTP for speculative decode makes things a hell of a lot faster, and the abliterated model plus dflash makes it even faster! I'm now seeing 70-90 tok/sec for most stuff. I like!
https://flowtivity.ai/blog/120-tok-s-1m-context-private-ai-d...