upvote
Qwen3.6-27B supports a 1 million token context window.

Of course, you have to have the right hardware to be able to run with a context window like that, as it takes about 100GB of memory on my DGX Spark to do that with full f16 KV cache on the q4_k_xl model.

reply
Got a similar result (my RTX 4070 only has 12 GB). I'm curious about whether 24/32 GB meaningfully improves this enough to make it useful.
reply
Try it on RAM and CPU.

It’s slower but you can run them.

reply
Good idea for evaluating the models, thanks.
reply
Prompt more directly instead of open ended.
reply