The lack of proper support for SSD offload (via mmap or otherwise) is really the worst part about this. There's no underlying reason why a 3B-active model shouldn't be able to run, however slowly, on a cheap 8GB MacBook Neo with active weights being streamed in from SSD and cached. (This seems to be in the works for GGML/GGUF as part of upgrading to newer upstream versions; no idea whether MLX inference can also support this easily.)