points
It should improve performance on most hardware because most LLMs are memory bandwidth bound during decode.