undefined

points

by torginus13 hours ago |

comments

by bojangleslover12 hours ago|

[-]

The complexity is what gets you. One of AWS's favorite situations is

1) Senior engineer starts on AWS

2) Senior engineer leaves because our industry does not value longevity or loyalty at all whatsoever (not saying it should, just observing that it doesn't)

3) New engineer comes in and panics

4) Ends up using a "managed service" to relieve the panic

5) New engineer leaves

6) Second new engineer comes in and not only panics but outright needs help

7) Paired with some "certified AWS partner" who claims to help "reduce cost" but who actually gets a kickback from the extra spend they induce (usually 10% if I'm not mistaken)

Calling it it ransomware is obviously hyperbolic but there are definitely some parallels one could draw

On top of it all, AWS pricing is about to massively go up due to the RAM price increase. There's no way it can't since AWS is over half of Amazon's profit while only around 15% of its revenue.

by Aurornis8 hours ago|

parent|

[-]

One of the biggest problems with the self-hosted situations I’ve seen is when the senior engineers who set it up leave and the next generation has to figure out how to run it all.

In theory with perfect documentation they’d have a good head start to learn it, but there is always a lot of unwritten knowledge involved in managing an inherited setup.

With AWS the knowledge is at least transferable and you can find people who have worked with that exact thing before.

Engineers also leave for a lot of reasons. Even highly paid engineers go off and retire, change to a job for more novelty, or decide to try starting their own business.

by strobe4 hours ago|

parent|

[-]

>With AWS the knowledge is at least transferable

unfortunately it lot of things in AWS that also could be messed up so it might be really hard to research what is going on. For example, you could have hundreds of Lambdas running without any idea where original sources and how they connected to each-other, or complex VPCs network routing where some rules and security groups shared randomly between services so if you do small change it could lead to completely difference service to degrade (like you were hired to help with service X but after you changes some service Y went down and you even not aware that it existed)

by Hikikomori4 hours ago|

parent|

[-]

Not much different from how it worked in companies I used to work for. Except the situation was even worse as we had no api or UI to probe for information.

by ethbr15 hours ago|

parent|

prev|

[-]

There are many great developers who are not also SREs. Building and operating/maintaining have their different mindsets.

by coliveira11 hours ago|

parent|

prev|

[-]

The end result of all this is that the percentage of people who know how to implement systems without AWS/Azure will be a single digit. From that point on, this will be the only "economic" way, it doesn't matter what the prices are.

by couscouspie11 hours ago|

parent|

[-]

That's not a factual statement over reality, but more of a normative judgement to justify resignation. Yes, professionals that know how to actually do these things are not abundantly available, but available enough to achieve the transition. The talent exists and is absolutely passionate about software freedom and hence highly intrinsically motivated to work on it. The only thing that is lacking so far is the demand and the talent available will skyrocket, when the market starts demanding it.

by eitally9 hours ago|

parent|

[-]

They actually are abundantly available and many are looking for work. The volume of "enterprise IT" sysadmin labor dwarfs that of the population of "big tech" employees and cloud architects.

by organsnyder9 hours ago|

parent|

[-]

I've worked with many "enterprise IT" sysadmins (in healthcare, specifically). Some are very proficient generalists, but most (in my experience) are fluent in only their specific platforms, no different than the typical AWS engineer.

by toomuchtodo9 hours ago|

parent|

[-]

Perhaps we need bootcamps for on prem stacks if we are concerned about a skills gap. This is no different imho from the trades skills shortage many developed countries face. The muscle must be flexed. Otherwise, you will be held captive by a provider "who does it all for you".

"Today, we are going to calculate the power requirements for this rack, rack the equipment, wire power and network up, and learn how to use PXE and iLO to get from zero to operational."

by organsnyder5 hours ago|

parent|

[-]

This might be my own ego talking (I see myself as a generalist), but IMHO what we need are people that are comfortable jumping into unfamiliar systems and learning on-the-fly, applying their existing knowledge to new domains (while recognizing the assumptions their existing knowledge is causing them to make). That seems much harder to teach, especially in a boot camp format.

by toomuchtodo4 hours ago|

parent|

[-]

As a very curious autodidact, I strongly agree, but this talent is rare and can punch it's own ticket (broadly speaking). These people innovate and build systems for others to maintain, in my experience. But, to your point, we should figure out the sorting hat for folks who want to radically own these on prem systems [1] if they are needed.

[1] https://xkcd.com/705/

by torginus8 hours ago|

parent|

prev|

[-]

Yeah, anyone who has >10 years experience with servers/backend dev has almost certainly managed dedicated infra.

by friendzis10 hours ago|

parent|

prev|

[-]

> and the talent available will skyrocket, when the market starts demanding it.

Part of what clouds are selling is experience. A "cloud admin" bootcamp graduate can be a useful "cloud engineer", but it takes some serious years of experience to become a talented on prem sre. So it becomes an ouroboros: moving towards clouds makes it easier to move to the clouds.

by SahAssar9 hours ago|

parent|

[-]

> A "cloud admin" bootcamp graduate can be a useful "cloud engineer"

That is not true. It takes a lot more than a bootcamp to be useful in this space, unless your definition is to copy-paste some CDK without knowing what it does.

by phil216 hours ago|

parent|

prev|

[-]

> A "cloud admin" bootcamp graduate can be a useful "cloud engineer",

If by useful you mean "useful at generating revenue for AWS or GCP" then sure, I agree.

These certificates and bootcamps are roughly equivalent to the Cisco CCNA certificate and training courses back in the 90's. That certificate existed to sell more Cisco gear - and Cisco outright admitted this at the time.

by bix611 hours ago|

parent|

prev|

[-]

> The only thing that is lacking so far is the demand and the talent available will skyrocket, when the market starts demanding it.

But will the market demand it? AWS just continues to grow.

by bluGill10 hours ago|

parent|

[-]

Only time will tell. It depends on when someone with a MBA starts asking questions about cloud spending and runs the real numbers. People promoting self hosting often are not counting all the cost of self hosting (AWS has people working 24x7 so that if something fails someone is there to take action)

by cheema337 hours ago|

parent|

[-]

> AWS has people working 24x7 so that if something fails someone is there to take action..

The number of things that these 24x7 people from AWS will cover for you is small. If your application craps out for any number of reasons that doesn't have anything to do with AWS, that is on you. If your app needs to run 24x7 and it is critical, then you need your own 24x7 person anyway.

by bluGill7 hours ago|

parent|

[-]

All the hardware and network issues are on them. I agree that you still need your own people to support you applications, but that is only part of the problem.

by iso16314 hours ago|

parent|

[-]

I've got thousands of devices over hundreds of sites in dozens of countries. The number of hardware failures are a tiny number, and certainly don't need 24/7 response

Meanwhile AWS breaks once or twice a year.

by misir9 hours ago|

parent|

prev|

[-]

From what I've seen, if you're depending on AWS, if something fails you too need someone 24x7 so that you can take action as well. Sometimes magic happens and systems recover after aws restarts their DNS, but usually the combination of event causes the application to get into an unrecoverable state that you need manual action. It doesn't always happen but you need someone to be there if it ever happens. Or bare minimum you need to evaluate if the underlying issue is really caused by AWS or something else has to be done on top of waiting for them to fix.

by bluGill9 hours ago|

parent|

[-]

How many problems is AWS able to handle for you that you are never aware of though?

by ragall1 hours ago|

parent|

[-]

Distributed systems can partly fail in many subtly different ways, and you almost never notice it because there are people on-call taking care of them.

by Symbiote8 hours ago|

parent|

prev|

[-]

How many problems do you think there are?

I've only had one outage I could attribute to running on-prem, meanwhile it's a bit of a joke with the non-IT staff in the office that when "The Internet" (i.e. Cloudflare, Amazon) goes down with news reports etc our own services are all running fine.

by infecto11 hours ago|

parent|

prev|

[-]

It’s all anecdotal but in my experiences it’s usually opposite. Bored senior engineer wants to use something new and picks a AWS bespoke service for a new project.

I am sure it happens a multitude of ways but I have never seen the case you are describing.

by alpinisme11 hours ago|

parent|

[-]

I’ve seen your case more than the ransom scenario too. But also even more often: early-to-mid-career dev saw a cloud pattern trending online, heard it was a new “best practice,” and so needed to find a way to move their company to using it.

by walt_grata6 hours ago|

parent|

prev|

[-]

Is that what I should be doing? I'm just encouraging the devs on my team to read designing data intensive apps and setting up time for group discussions. Aside from coding and meetings that is.

by themafia1 hours ago|

parent|

prev|

[-]

> 7) Paired with some "certified AWS partner"

What do you think RedHat support contracts are? This situation exists in every technology stack in existence.

by antonvs8 hours ago|

parent|

prev|

[-]

> 3) New engineer comes in and panics

> 4) Ends up using a "managed service" to relieve the panic

It's not as though this is unique to cloud.

I've seen multiple managers come in and introduce some SaaS because it fills a gap in their own understanding and abilities. Then when they leave, everyone stops using it and the account is cancelled.

The difference with cloud is that it tends to be more central to the operation, so can't just be canceled when an advocate leaves.

by antonvs8 hours ago|

parent|

prev|

[-]

> One of AWS's favorite situations

I'll give you an alternative scenario, which IME is more realistic.

I'm a software developer, and I've worked at several companies, big and small and in-between, with poor to abysmal IT/operations. I've introduced and/or advocated cloud at all of them.

The idea that it's "more expensive" is nonsense in these situations. Calculate the cost of the IT/operations incompetence, and the cost of the slowness of getting anything done, and cloud is cheap.

Extremely cheap.

Not only that, it can increase shipping velocity, and enable all kinds of important capabilities that the business otherwise just wouldn't have, or would struggle to implement.

Much of the "cloud so expensive" crowd are just engineers too narrowly focused on a small part of the picture, or in denial about their ability to compete with the competence of cloud providers.

by acdha6 hours ago|

parent|

[-]

> Much of the "cloud so expensive" crowd are just engineers too narrowly focused on a small part of the picture, or in denial about their ability to compete with the competence of cloud providers

This has been my experience as well. There are legitimate points of criticism but every time I’ve seen someone try to make that argument it’s been comparing significantly different levels of service (e.g. a storage comparison equating S3 with tape) or leaving out entire categories of cost like the time someone tried to say their bare metal costs for a two server database cluster was comparable to RDS despite not even having things like power or backups.

by mrweasel12 hours ago|

prev|

[-]

Just this week a friend of mine was spinning up some AWS managed service, complaining about the complexity, and how any reconfiguration took 45 minutes to reload. It's a service you can just install with apt, the default configuration is fine. Not only is many service no longer cheaper in the cloud, the management overhead also exceed that of on-prem.

by mystifyingpoi12 hours ago|

parent|

[-]

I'd gladly use (and maybe even pay for!) an open-source reimplementation of AWS RDS Aurora. All the bells and whistles with failover, clustering, volume-based snaps, cross-region replication, metrics etc.

As far as I know, nothing comes close to Aurora functionality. Even in vibecoding world. No, 'apt-get install postgres' is not enough.

by SOLAR_FIELDS11 hours ago|

parent|

[-]

serverless v2 is one of the products that i was skeptical about but is genuinely one of the most robust solutions out there in that space. it has its warts, but I usually default to it for fresh installs because you get so much out of the box with it

by sgarland9 hours ago|

parent|

prev|

[-]

Nitpick (I blame Amazon for their horrible naming): Aurora and RDS are separate products.

What you’re asking for can mostly be pieced together, but no, it doesn’t exist as-is.

Failover: this has been a thing for a long time. Set up a synchronous standby, then add a monitoring job that checks heartbeats and promotes the standby when needed. Optionally use something like heartbeat to have a floating IP that gets swapped on failover, or handle routing with pgbouncer / pgcat etc. instead. Alternatively, use pg_auto_failover, which does all of this for you.

Clustering: you mean read replicas?

Volume-based snaps: assuming you mean CoW snapshots, that’s a filesystem implementation detail. Use ZFS (or btrfs, but I wouldn’t, personally). Or Ceph if you need a distributed storage solution, but I would definitely not try to run Ceph in prod unless you really, really know what you’re doing. Lightbits is another solution, but it isn’t free (as in beer).

Cross-region replication: this is just replication? It doesn’t matter where the other node[s] are, as long as they’re reachable, and you’ve accepted the tradeoffs of latency (synchronous standbys) or potential data loss (async standbys).

Metrics: Percona Monitoring & Management if you want a dedicated DB-first, all-in-one monitoring solution, otherwise set up your own scrapers and dashboards in whatever you’d like.

What you will not get from this is Aurora’s shared cluster volume. I personally think that’s a good thing, because I think separating compute from storage is a terrible tradeoff for performance, but YMMV. What that means is you need to manage disk utilization and capacity, as well as properly designing your failure domain. For example, if you have a synchronous standby, you may decide that you don’t care if a disk dies, so no messing with any kind of RAID (though you’d then miss out on ZFS’ auto-repair from bad checksums). As long as this aligns with your failure domain model, it’s fine - you might have separate physical disks, but co-locate the Postgres instances in a single physical server (…don’t), or you might require separate servers, or separate racks, or separate data centers, etc.

tl;dr you can fairly closely replicate the experience of Aurora, but you’ll need to know what you’re doing. And frankly, if you don’t, even if someone built a OSS product that does all of this, you shouldn’t be running it in prod - how will you fix issues when they crop up?

by vel0city9 hours ago|

parent|

[-]

> you can fairly closely replicate the experience of Aurora

Nobody doubts one could build something similar to Aurora given enough budget, time, and skills.

But that's not replicating the experience of Aurora. The experience of Aurora is I can have all of that, in like 30 lines of terraform and a few minutes. And then I don't need to worry about managing the zpools, I don't need to ensure the heartbeats are working fine, I don't need to worry about hardware failures (to a large extent), I don't need to drive to multiple different physical locations to set up the hardware, I don't need to worry about handling patching, etc.

You might replicate the features, but you're not replicating the experience.

by sgarland8 hours ago|

parent|

[-]

The person I replied to said they wanted an open-source reimplementation of Aurora. My point - which was probably poorly-worded, or just implied - was that there's a lot of work that goes into something like that, and if you can't put the pieces together on your own, you probably shouldn't be running it for anything you can't afford downtime on.

Managed services have a clear value proposition. I personally think they're grossly overpriced, but I understand the appeal. Asking for that experience but also free / cheap doesn't make any sense.

by infecto11 hours ago|

parent|

prev|

[-]

What managed service? Curious, I don’t use the full suite of aws services but wondering what would take 45mins, maybe it was a large cluster of some sort that needed rolling changes?

by coliveira11 hours ago|

parent|

[-]

My observation is that all these services are exploding in complexity, and they justify saying that there are more features now, so everyone needs to accept spending more and more time and effort for the same results.

by patrick45111 hours ago|

parent|

[-]

It's basically the same dynamic as hedonic adjustment in the CPI calculations. Cars may cost twice as much now they have usb chargers built in so inflation isn't really that bad.

by mrweasel11 hours ago|

parent|

prev|

[-]

I think this was MWAA

by ragall1 hours ago|

parent|

prev|

[-]

Cloud was never cheaper. It was as convenient.

by coredog649 hours ago|

prev|

[-]

> If you're using something like ECS or serverless, AWS gains nothing by optimizing the servers to make your code run faster - their hard work results in less billed infrastructure hours.

If ECS is faster, then you're more satisfied with AWS and less likely to migrate. You're also open to additional services that might bring up the spend (e.g. ECS Container Insights or X-Ray)

Source: Former Amazon employee

by torginus8 hours ago|

parent|

[-]

We did some benchmarks and ECS was definitely quite a bit more expensive for a given capacity than just running docker on our own EC2 instances. It also bears pointing out that a lot of applications (either in-house or off-the-shelf) expect a persistent mutable config directory or sqlite database.

We used EFS to solve that issue, but it was very awkward, expensive and slow, its certainly not meant for that.

by lumost7 hours ago|

prev|

[-]

I don’t understand why most cloud backend designs seem to strive for maximizing the number of services used.

My biggest gripe with this is async tasks where the app does numerous hijinks to avoid a 10 minute lambda processing timeout. Rather than structure the process to process many independent and small batches, or simply using a modest container to do the job in a single shot - a myriad of intermediate steps are introduced to write data to dynamo/s3/kinesis + sqs/and coordination.

A dynamically provisioned, serverless container with 24 cores and 64 GB of memory can happily process GBs of data transformations.

by parentheses7 hours ago|

prev|

[-]

Fully agree to this. I find the cost of cloud providers is mostly driven by architecture. If you're cost conscious, cloud architectures need to be up-front designed with this in mind.

Microservices is a killer with cost. For each microservices pod - you're often running a bunch of side cars - datadog, auth, ingress - you pay massive workload separation overhead with orchestration, management, monitoring and ofc complexity

I am just flabbergasted that this is how we operate as a norm in our industry.

by jdmichal9 hours ago|

prev|

[-]

It's about fitting your utilization to the model that best serves you.

If you can keep 4 "Java boxes" fed with work 80%+ of the time, then sure EC2 is a good fit.

We do a lot of batch processing and save money over having EC2 boxes always on. Sure we could probably pinch some more pennies if we managed the EC2 box uptime and figured out mechanisms for load balancing the batches... But that's engineering time we just don't really care to spend when ECS nets us most of the savings advantage and is simple to reason about and use.

by nthdesign9 hours ago|

prev|

[-]

Agreed. There is a wide price difference between running a managed AWS or Azure MySQL service and running MySQL on a VM that you spin up in AWS or Azure.

by re-thc12 hours ago|

prev|

[-]

> your infra is so efficient and cheap that even paying 4x for it rather than going for colocation is well worth it because of the QoL and QoS.

You don’t need colocation to save 4x though. Bandwidth pricing is 10x. EC2 is 2-4x especially outside US. EBS for its iops is just bad.