Hacker News
new
past
comments
ask
show
jobs
points
by
furyofantares
3 hours ago
|
comments
by
Alifatisk
1 hours ago
|
next
[-]
From Qwen-3-max thinking, I remember the inference becoming veeery slow as you pushed towards 1M context, already at 300k tokens you would notice the degradation. But of course, I was using Qwen Chat, so could be a resource allocation thing.
reply
by
nwienert
2 hours ago
|
prev
|
[-]
I found it worse, in a very clear way.
reply