upvote
> I wish I can run a competitive oss model on 32GB machine in 3 years.

You could try DS4 on that machine anyway and see how gracefully it degrades (assuming that it runs and doesn't just OOM immediately). Experimenting with 36GB/48GB/64GB would also be nice; they might be able to gain some compute throughput back by batching multiple sessions together (though obviously at the expense of speed for any single session).

reply
> I wish I can run a competitive oss model on 32GB machine in 3 years.

It's so hard to predict what size the open-weight models will be, even in 6 months time. Will a 96GB machine turn out to be a complete waste of money? Who knows.

reply
> `from opentele while import trace`

FYI, this to me points to an inference bug, bad sampling, or a non-native quant. OpenRouter is known to route requests to absolutely terrible, borked implementations. A model like DeepSeek V4 Flash shouldn't be making syntax errors like this.

reply