undefined

points

[-]

I expect in the future we'll find out that someone in the industry was juicing the numbers with fake thinking tokens or something. The whole pricing model of charging you for the tokens it generates while not knowing how much it is going to generate going in has always been pretty crazy.

by ThunderSizzle5 hours ago|

parent|

[-]

It reminds of early smart phones when the cell providers pulled away from unlimited data...and then they brought it back in s few years.

I think competition will get fierce. We see many people are attracted to the price stability of GHCP - it became clear what a request could do - the problem is that they didn't match results with cost. It's not clear what a 5 hour usage window in Claude Code can do.

There's no reason the harness couldn't provide a quote on the next request, aside from it takes effort and it would be upfront to the user, creating expectations.

by Sohcahtoa8219 hours ago|

prev|

[-]

Yeah, this was my frustration with Suno and Sora. You can burn a lot of credits (not to mention time) generating things that aren't what you wanted.

I don't mind a PAYG model for a simple chat interface. But when it comes to actually producing things, you burn through TONS of tokens creating the wrong output.

by benoau12 hours ago|

prev|

[-]

Internet usage when you billed by the hour but your connection was so slow it took a minute plus to load pages lol.

by tencentshill17 hours ago|

prev|

[-]

It incentivizes you to do most of that prompting on your own hardware/time, and only feed the final prompt with only necessary context to the big AI in the sky. It might even force you to think about the problems yourself for a bit!

by DaiPlusPlus18 hours ago|

prev|

[-]

> I wonder how long it'll be before all AI costs are flat unlimited monthly fees or even free across the board, without compromise.

That's already the case if you can self-host an LLM; you don't even need a mythical H200: gamer-grade GeForce cards can get you a long way there (if this page is to be believed: https://www.runpod.io/gpu-compare/rtx-5090-vs-h200 )

...after RAM prices return to normalcy, of course - and then wait another 2 or 3 generations of GPU development for a 96GB HBM card to hit the streets - and also assuming SotA or cloud-only LLMs don't experience lifestyle-inflation, but I assume they must, because OpenAI/Anthropic/Etc's business-model depends on people paying them to access them, so it's in their interests to make it as difficult as possible to run them locally.

Give it 5 years from now and reassess.

by Ballas5 hours ago|

parent|

[-]

That page compares models that easily fit inside the ram on either GPU. The biggest difference comes when one card can fit a model and the other cannot.