undefined

points

[-]

It's a two year old base model that's only 3B parameters, trained on only 100B tokens. It's still a research project at this point.

by gardnr5 hours ago|

parent|

[-]

The new model they just released has impressive benchmark results: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T

Except on GSM8K and math...

by naasking5 hours ago|

parent|

[-]

Thanks for the link, the GSM8K result actually leads the pack in that table, but math is indeed underwhelming. Qwen 2.5 is in the lead, but bitnet isn't far behind and it takes 1/6th as much memory during inference, and was trained on less than 1/4 the number of tokens. Pretty cool.