I haven’t tried it yet, but Rapid MLX has a neat feature for automatic model switching. It runs a local model using Apple’s MLX framework, then “falls forward” to the cloud dynamically based on usage patterns:
> Smart Cloud Routing
>
> Large-context requests auto-route to a cloud LLM (GPT-5, Claude, etc.) when local prefill would be slow. Routing based on new tokens after cache hit. --cloud-model openai/gpt-5 --cloud-threshold 20000
https://github.com/raullenchai/Rapid-MLX