upvote
Curious if you tested llama.cpp and still found oMLX faster? I haven't tried the latter myself, might give it a go.
reply