Yeah but 4 bits very often loops needlessly. Which is not that bad because you do not pay for tokens. But you paid for hardware and you want use it for something useful. Q6 is better but then you have like 40t/s prefill. Which is really tiring. But at least it says sorry when you ask it what is wrong! I heard there is some extension for PI preventing that. I need to look into it.
Otherwise I am quite happy.
reply