undefined

points

[-]

"Flash-Lite" is a different product from "Flash", which is more expensive. They couldn't be more confusing with their naming though, especially since they have 3.1 Pro and not 3.1 Flash non-lite.

by WarmWash10 hours ago|

prev|

[-]

I haven't used 3.5 at all yet, but previous Gemini (and Gemma models) are by far the most token light per task than any other model.

Cost per task is a more productive measure, but obviously a more difficult one to benchmark.

by iwhalen11 hours ago|

prev|

[-]

I wonder why they didn't discuss price in the post?

Compare to the GPT-5.5 announcement: https://openai.com/index/introducing-gpt-5-5/

by himata411311 hours ago|

prev|

[-]

I don't think input/output pricing matters, 90% of the cost is cache. $0.15 is pretty good, but still very expensive.

by wolttam11 hours ago|

parent|

[-]

It depends on the use-case. yes, 90% of cost is cache in agentic coding scenarios (actually 95% in my experience). But not when the model reasons for 200k+ tokens before answering a complex problem.

by himata411311 hours ago|

parent|

[-]

gemini models solve a problem in 80% less tokens so that's something to think about.

by johaugum10 hours ago|

parent|

[-]

Source?

by himata41138 hours ago|

parent|

[-]

https://help.kagi.com/kagi/ai/llm-benchmark.html

by simonw10 hours ago|

parent|

prev|

[-]

Gemini caching is confusing though:

  $0.15 / million tokens
  $1.00 / 1,000,000 tokens per hour (storage price)

I much prefer the OpenAI/DeepSeek way of pricing caching where you don't have to think about storage price at all - you pay for cached tokens if you reuse the same prefix within a (loosely defined) time period.

by simonw9 hours ago|

parent|

[-]

As far as I can tell Gemini caching DOES work like OpenAI - see implicit caching here: https://ai.google.dev/gemini-api/docs/caching

I confirmed this by running a bunch of prompts through Gemini 3.5 Flash without doing anything special to configure caching and noting that it comes back with a "cachedContentTokenCount" on many of the responses.

The "storage price" quoted is for an optional Gemini feature that most people don't care about: https://ai.google.dev/gemini-api/docs/caching#explicit-cachi...

by __jl__11 hours ago|

parent|

prev|

[-]

In our experience, caching is not very reliable with google. We always get random cache misses that don't happen with other providers. We find OpenAI, Anthropic and Fireworks (which we use a lot) all have higher cache hit rates. So it's not only about the costs of cached token but also what kind of cached hit rate you get.

by svachalek10 hours ago|

parent|

[-]

In my experience Google is the most flaky in general, which is surprising considering the rock solid history of their search and other products. Just more likely not to respond at all, to give a response out of left field, to handle the same error in 12 different ways randomly (a rainbow of HTTP status codes and error messages), etc etc.

by gwern7 hours ago|

parent|

[-]

I agree. The https://aistudio.google.com/ is shockingly bad. I'm not sure I've ever used such a flaky Google service before. It's so much worse than Gmail or Google, not to mention ChatGPT or Claude or DeepSeek or Kimi or Midjourney web interfaces. The bizarre janky integration with your Google Drive, or Gemini or NBPs randomly erroring out, often indefinitely. I've had sessions refresh themselves and just... disappearing. Or when you get frustrated with a buggy dead session and hit 'new session' and have to wait minutes for 'saving...' to happen.

by veselin9 hours ago|

parent|

prev|

[-]

Exactly our experience too. Effectively we catch these and on these status codes, we send to OpenAI. Retrying the same query in Gemini has high chance to give kind-of the same status code.

by minimaxir11 hours ago|

parent|

prev|

[-]

10% of input pricing is standard especially compared to competition.

by himata411311 hours ago|

parent|

[-]

yah, which means that the input cost is the only value that should be paid attention to at the end + the cache discount (x10). If google would start offering x20 discount it would make it twice as cheap while input and output stayed the same.

by John787878111 hours ago|

prev|

[-]

[deleted]

by stri8ed11 hours ago|

parent|

[-]

Output cost is 3x from Gemini 3 flash.