upvote
Tokens per second. The difference between 8B and something like 16B is not as big as you might think in practical usage and 8B is a lot faster and interactive than 16B but there are certain things where it is useful to farm it out to the large model.
reply
Agree. For local coding help, latency often matters more than raw benchmark quality. A slightly weaker model that answers immediately changes how often you reach for it.
reply
Exactly this.

Creating conversation titles and parsing HTML/JSON don't benefit from 27B models.

The B70 can run both models comfortably side-by-side so it makes better use of time and resources.

reply