upvote
Thanks for the feedback! Our primary focus is charging by energy, for token pricing we really just try to be close to the market. That being said I'll take a look at our token pricing to see if we need an update there https://portal.neuralwatt.com/energy-pricing Generally our users get much lower cost on energy than token pricing though on a typical request with a high prefix cache hit the input, cached costs is very small and the output energy cost is higher.

We definitely don't have any intention to obfuscate and in fact we actually try and provide more data than any other provider out there about both an individual request, as well as the fleet behavior. Since we tend to focus directly on our energy pricing and optimizing that the issue is likely where the ROI lies on energy optimization versus token optimization (totally correlated but we have other levers to reduce energy while keeping token counts the same).

reply