upvote
It's a two year old base model that's only 3B parameters, trained on only 100B tokens. It's still a research project at this point.
reply
The new model they just released has impressive benchmark results: https://huggingface.co/microsoft/bitnet-b1.58-2B-4T

Except on GSM8K and math...

reply
Thanks for the link, the GSM8K result actually leads the pack in that table, but math is indeed underwhelming. Qwen 2.5 is in the lead, but bitnet isn't far behind and it takes 1/6th as much memory during inference, and was trained on less than 1/4 the number of tokens. Pretty cool.
reply