upvote
These kind of limits happen all the time for big clients.

Cloud services like to present the illusion of an infinite amount of compute available at a fixed price per unit, but the reality is if you try to use too much of any service you'll find you have a quota and requests to increase it will fall on deaf ears if the provider doesn't have more of that resource.

Too much of my working life has been spent shoehorning services into less space/compute/ram/spindles or migrations to other data centers to solve such issues.

reply
If you allow me a bit of pedantry, it's infinite "for all intents and purposes". It doesn't mean you can request civilizational levels of compute, but for a blog, a crud, an ETL and such, that is regular use cases with sensible scale you can absorb any elastic demand.

Having said that, I agree with you. You have to request limit increases often and can't scale even in those instances if you don't plan ahead.

reply
Yeah but you don't need cloud for a blog. Cloud was sold as effectively infinite resources - capacity isn't infinite, or effectively infinite, it's 20% more than you are currently using and you pay 300% more for that.

There has to be a name for this deceptive marketing tactic where you say something is unlimited and then it is only unlimited as long as you don't use very much.

It would be one thing if you occasionally got a "no more capacity" error when requesting large amounts of resources but it doesn't work that way. They confine you to a relatively small amount of resources the entire time you have an account. If you want more you have to request it.

reply
It was sold as flexible, near instant provisioning of DC level resources. I don't recall having seen infinite anywhere.
reply
It's not flexible if it only flexes 20% above your current usage
reply
For 99+% of users it will flex 10000% above their current usage
reply
Then 99+% of users could save three quarters of their costs by switching to a traditional VPS provider.
reply
A blog for your product, if your product is already on the cloud, is a very sensible use case for the cloud. Static one deployed to a bucket and a CDN, fast, cache on the edge, high availability.

The tiny blog sure isn't for the cloud, but also it's not the main client of the cloud.

> it's 20% more than you are currently using and you pay 300% more for that.

I'm assuming you are comparing to self hosting. Then you need to account for things that are difficult to put a price like your time maintaining a physical infrastructure and the lessons you will learn with it.

Sounds like I'm defending the big cloud, but there is a valid use that is disconsidered because it's trendy to hate on the cloud.

> They confine you to a relatively small amount of resources the entire time you have an account. If you want more you have to request it.

It's a form of KYC, nothing wrong with that.

reply
I compare cloud to non-cloud VPSes. If you compare them to self-hosting the price is even more biased against cloud, even with current RAM prices. Did you know you can get 40G or 100G dedicated internet to your colo rack for something like $2000 a month (prices vary greatly, YMMV)? Colo only makes sense if you need a fairly large quantity of compute resources, but the per-unit cost can be very good. Every other style of hosting is building on top of it with a profit margin, after all.
reply
if im going to have to ask for capacity, why dont I just get my own bare metal servers then?
reply
Because you don't have to wait for weeks just for delivery. And you pay for elastic usage.
reply
You can order bare metal servers delivery time in minutes from any number of hosting providers and the cost difference is so huge you can afford to keep excess capacity and still come out ahead.
reply
I run the CI infra for our company, and our bare metal costs (sans my salary baked in), are one order of magnitude less than if using any other CI saas provider like github or others.

Like literally 10x times more expensive to do so, to run CI jobs...

I dont want to imagine the margin AWS has like generally, cause it can easily be a 90% too

reply
Right? It's actually crazy how much they don't cost. Are you using it more than 10%? If so, you're saving money.

I assume you're using your owned server and not a provider like Hetzner? So you did have a substantial delivery time. Although in my city is a recycled that resells used servers, and I could show up there with a truck and get a server within hours if I'm not too picky. Or use some random desktop or laptop off the pile, short-term.

reply
No we're on hetzner/ovh boxes, so delivery time really isnt an issue.

Right now the biggest issue is the vibe coded CI program is not really meant to be a distributed multi-node thing yet, so we're on the biggest machines (there's some newer bigger stuff we could migrate too) and the only issue is on peak hours queue can get a bit slow.. but that was also some other bugs etc making not ideal.

Tbh it works pretty well, we just need now to scale it to more than one node etc (which is not to say that is easy, but still, x10 headroom to work with)

reply
Even as a small customer it's easy to hit quotas or hit availablity constraints of more unusual instance types.
reply
definitionally that's "for some intents and purposes" my man
reply
For all intents and purposes is a figure of speech, meaning in every practical sense.
reply
Given Meta’s current AI situation though, I wouldn’t be surprised if they were trying to do distillation and the capacity story is a cover
reply
You can't actually "distill" reasoning from a model that doesn't expose it's genuine thinking tokens, and none of these do.

When Anthropic accuse Alibaba of distilling their models, you have run that by a reality check of what is actually possible.

1) You can use another model as "LLM as judge" to rate alternative outputs that your own model has generated. Useful data perhaps, but certainly not distillation.

2) If what you are interested in are the reasoning steps (that are hidden from you) that arrived at an answer, not the answer itself, then you can try to train a model to guess what those steps were (this is a published technique). This may be better than nothing, but hardly distillation if it's your model that is suggesting the reasoning!

3) Depending on the model, you may be able prompt engineer it to reveal it's reasoning, not just show a summary, but this should be very obvious. Anthropic cite this as something they have seen. This would be useful data if you can get it (presumably they've now done a better job of preventing it), but at the end of the day all you'd be getting is some training data cheaper than if you'd had to create it by hand.

reply