Also, can the engine support transparent mmap use for fetching weights from disk on-demand, at least when using pure CPU? (GPU inference might be harder, since it's not clear how page faults would interact with running a shader.)
If the latter test is successful, next would be testing Macs with more limited RAM, first running simple requests (would be quite slow) then larger batches (might be more worthwhile if one can partially amortize the cost of fetching weights from storage, and be bottlenecked by other factors).