1. You never know how much a single API request will cost or did cost for the gemini api
2. It takes anywhere between 12-24 hours to tell you how much they will charge you for past aggregate requests
3. No simple way to set limits on payment anywhere in google cloud
4. Either they are charging for the batch api before even returning a result, or their "minimal" thinking mode is burning through 15k tokens for a simple image description task with <200 output tokens. I have no way of knowing which of the two it is. The tokens in the UI are not adding up to the costs, so I can only assume its the first.
5. Incomplete batch requests can't be retrieved if they expire, despite being charged.
6. A truly labyrinthine ui experience that makes modern gacha game developers blush
All I have learned here is to never, ever use a google product.
Distributed “shared nothing” API handling should make usage available to accounting, and the API handling orchestrator should have a hook that allows accounting to revoke or flag a key.
This gets the accounting transactions and key availability management out of the request handling.
https://docs.cloud.google.com/billing/docs/how-to/budgets
They are still not a spending cap of course.