undefined

points

[-]

I have researched for quite a bit and so far the fastest runtime is the oMLX one. But there's a caveat: ttft on MLX on M4 Pro is enormous. On M5 Pro it has been greatly sped up.

by regexorcist33 seconds ago|

parent|

[-]

Curious if you tested llama.cpp and still found oMLX faster? I haven't tried the latter myself, might give it a go.

by mft_29 minutes ago|

prev|

[-]

I tried Unsloth Studio recently and was disappointed - in particular the downloading functionality is half-baked and didn’t cope with resuming downloads. As it seemed to just be a simple wrapper over llama.cpp, I found that huggingface hub, llama.cpp, and a couple of simple scripts actually offered better functionality once it was set up.