At that power consumption, you also end up being more expensive than API calls and many times slower. It starts to feel very stupid to run local interference.
If the client is very keen on privacy, then they can pay for the NVIDIA.
I end up returning my B70s, and bought RTX PRO 6000.
Hardware-wise a B70 should be significantly faster than any of the available CPUs at ML inference. If it was not so in your tests, that must really be a software problem, so you must identify the software, for others to know what does not work.