Yeah, I didn't write this as a proper developer guide. My screen recording started getting loads of favourites and I started getting messages asking about how I set it up, so just through up a quick rundown of how I setup this test.
I little just saw the Unclothe announcement about "Double the speed" and thought "Ha. I wonder if that will get it fast enough I'd actually be prepared to use it" and had a go at setting it up.
I'd done tests before last year with things like Devstral, but they were always both so slow and dumb, I didn't want to bother.
This finally hit the "wow, this is useable" level of both speed and intelligence.
Are you sure you did not mean Unsloth?
llama.cpp includes tools for that, what you are looking at is to have a prefill before token generation to measure it properly. Increasingly also, measuring token generation speed at longer context (32k or 64k) is important too.