The correct solution isn't yet another cloud service, but rather local models.
Within the enclave itself, DRAM and PCIe connections between the CPU and GPU are encrypted, but the CPU registers and the GPU onboard memory are plaintext. So the computation is happening on plaintext data, it’s just extremely difficult to access it from even the machine running the enclave.