Nothing obviously prevents using this approach, e.g. for 3B-active or 10B-active models, which do run on consumer hardware. I'd love to see how the 3B performs with this on the MacBook Neo, for example. More relevantly, data-center scale tokens are only cheaper for the specific type of tokens data centers sell. If you're willing to wait long enough for your inferences (and your overall volume is low enough that you can afford this) you can use approaches like OP's (offloading read-only data to storage) to handle inference on low-performing, slow "edge" devices.
It is consumer hardware in the sense that Macbook Pros come with this RAM size as base and that you can buy them as a consumer, without having to sign a special B2B contract, show that your company is big and reputable enough, and order a minimum of 10 or 100.