Is it even though? Quantization and speculative decoding are improving the local AI story by leaps and bounds every month.
All these tricks like quantization and speculative decoding can also be used by the leading AI labs, which means they will simply have more compute than you at the end of the day. So far this has translated into better performance.