undefined

points

[-]

From Qwen-3-max thinking, I remember the inference becoming veeery slow as you pushed towards 1M context, already at 300k tokens you would notice the degradation. But of course, I was using Qwen Chat, so could be a resource allocation thing.

by nwienert2 hours ago|

prev|

[-]

I found it worse, in a very clear way.