upvote
My M5 Pro is getting ~11 tokens per second via OMLX for an 8 bit quant.
reply
A Mac is not going to be all that much faster than a 5080 with any models, other than the ones you can’t currently run at all because you don’t have enough GPU+CPU memory combined.

You’re much better off adding a second GPU if you’ve already got a PC you’re using.

reply