undefined

points

[-]

I'm getting 6.55t/s using the Qwen3.5-397B-A17B-4bit model with the command: ./infer --prompt "Explain quantum computing" --tokens 100

MacBook Pro M5 Pro (64GB RAM)

by j458 hours ago|

parent|

[-]

Appreciate the data point. M5 Max would also be interesting to see once available in desktop form.

by logicallee9 hours ago|

parent|

prev|

[-]

can you post the final result (or as far as you got before you killed it) to show us how cohesive and good it is? I'd like to see an example of the output of this.

by frwickst9 hours ago|

parent|

[-]

Since the output is quite long, here is a link: https://pastebin.com/k76wiVGP

by hrimfaxi9 hours ago|

parent|

[-]

Why does this G character appear to prefix most of the output? ("Ġlike")

by frwickst8 hours ago|

parent|

[-]

It is a tokenizer artifact most likely (https://github.com/huggingface/transformers/issues/4786). So the output is not properly decoded in this case, it should just be a space.

by kgeist8 hours ago|

parent|

prev|

[-]

The original tokens have Ġ instead of space. I had this issue too when writing an inference engine for Qwen. You have to "normalize" those special characters.