At a previous bigger company, getting procurement to sign up to a new provider requires writing a business case, justifying the spend and then getting multiple competing quotes and speaking to their sales teams. Signing up to a new service takes _months_ even for $10/mo as they’ll negotiate for bulk discounts and the best possible terms for something that will literally cost less per year than one of meetings they hold to discuss the “value”. Meanwhile on AWS I can click a button in the marketplace and it gets thrown in the AWS account which is pre approved spending.
We use it because we don’t want to deal with slow procurement process. It kills all the momentum.
Watched one company end up with a $250k AWS bill when their credits expired (which they could not pay).
As an example, they needed a lot of proxy servers. Instead of just using a proxy service, there was a fleet of ec2 instances.
I agree that it's overcomplicated. Although having the self-service portal also for assigning IPs is useful. But most of it seems overkill. Although, being able to detach storage from VMs and such is also quite flexible. But still.
Our 64 core spot instances on windows were taking 8-10x longer than our developer machines with the same core count, and there was a bunch of engineering went into the scaling, queue management, etc. if we’d just had a single bare metal machine from hetzner we could have saved money _and_ reduced our iteration times.
> burning money in cloud
I suspect there's two reasons why this happens.
One is just the disassociation with opex that seems ever present in the VC model. The other is that many startups settle in on a ops solution before hiring ops and the cost of switching isn't that attractive until they're faced with a dwindling runway and a down round.
Sounds expensive
Those tend to be tricky hires on the small end since you tend to want jack-of-all-trades who either demands a premium salary or doesn't exist.
When you have 10 software engineers, having 1 dedicate 10-20% of their time is cheaper than hiring 1-2 FTEs that aren't writing code.
If you can get (and trust they do it right) developers to do AWS or Kubernetes, you should be able to trust them to do conventional Linux sysadmin on a bunch of dedicated boxes.
A full stack/backed dev is more than capable of learning both, but one of those has way more foot guns than the other.
This is the main reason; and it applies to developers (they need cloud buzzwords on their resume), it applies to managers (who in turn hire only those with said buzzwords) and it applies to company execs/CTOs who can brag about the complex (self-inflicted) problems their company is solving at the next cloud provider conference, so they can justify yet another VC round.
Run this for over a decade, and you'll end up in a situation where an entire generation of "engineers" is no longer capable of configuring a Linux box to serve some basic webapp and will make up whatever reasons to avoid even attempting to do so.
I can stand something up on AWS in a couple hours and be fairly confident it will run reliably (assuming their service offering is actually decent--some suck)
We test backups and they never fail. Metrics and logs always work.
>People are unnecessarily complicating stuff, and these clouds can go very expensive very quickly.
I don't think that's the cloud vendors fault. They make it easy to stand up new services so people get overly enthusiastic and create convoluted architectures. Have Postgres but need full text search? OpenSearch is just a few clicks (well hopefully IaC config..) away, let's use that! When you're building yourself and need to setup the stack, instrument, monitor, configure backups the cost is high enough where you say "hey, maybe pg fts is fine for now"
But now you need staffing/headcount to be experts in, and maintain, upgrade and be oncall for this stuff?
I think we spend around 5k in aws, and I’m pretty sure we could be much more performant for a fraction of that price.
The problem is, who is going to setup everything? Hiring someone would for sure cost more than 5k
By the time I joined, 18 months after development had started, a giant, complex, hideously tentacled software beast had been built that used every possible AWS service that the massive offshore team of developers could find to use.
It should have been built on a single Linux box by a single senior developer with Python and Postgres or nodejs or Ruby or whatever.
They went out of business after not too long and I couldn't help wondering if things might have been different if they hadn't spent a fortune building a giant money making machine for AWS, instead of making a web application on a Linux box.
Every AWS project I have worked on has had some significant work put into programming AWS instead of writing business functionality.
To be fair, if they had a AWS Solution Architect involved they heavily push you down this road and if they manage to get in management's ear they'll push the idea that server-less AWS features is vastly cheaper.
If you're only responding to a handful of requests that's true, but once things ramp up you get "nickel and dimed" for everything: API Gateway requests, lambda execution time, DynamoDB read/write units, CloudWatch logs, outgoing data, step function transitions, S3 requests.
I understand all those services cost money and they shouldn't be free, but I question if paying all those micro-transactions is worse then paying for your own VMs, especially once your customers complain about the cold starts and you think you can fix it with "lambda warming"
You moved something from a single datastore to three different database technologies? I don't know your domain, but that sure doesn't sound like a complexity reduction.
what's bad about graphana? it's simply used for some alerts and monitoring, i've used it for really long time and it has never failed me not even once.
it's much simpler to query postgres or mongo compared to duplicating data dozens of times on datastore.
I can't really judge your database choices; I don't know your specific problems, we're just trading quips on the internet for fun. But man oh man grafana is disappointing.
You removed all of their logging and all of their redundancy and reliability and replaced it with shitters that will all explode if the small providers one data centre goes down.
And if someone penetrates this mega server, they’ll be able to wipe all your logs or tamper with them, to hide the attack.
If your storage servers go down, everything they have is gone. And these providers don’t offer the finest hardware. How do you know all of those drives aren’t from the same batch? They will be, because they’re a bulk buyer with a single data centre.
they'll never need it, a misconfiguration on those service ends up costing several grands.
>If your storage servers go down, everything they have is gone
It’s just logs for an app server, not some banking critical info that will cause a panic if lost. Most of what they are using for logging is for finding some errors, not for mission-critical things which must not be lost.
Because it's explicitly something you can request when doing your server order from your vendor. In this particular case several years ago, Nutanix did good.