Know how you find all the permissions a single user in GCP has? You have to make 9+ API calls, then filter/merge all the results. They finally added a web tool to try and "discover" the permissions for a user... you sit there and watch it spin while it madly calls backend APIs to try to figure it out. Permissions for a single user can be assigned to users, groups, orgs, projects, folders, resources, (and more I forget), and there's inheritance to make it more complex. It can take all day to track down every single place the permissions could be set for a single user in a single hierarchical organization, or where something is blocking some permission. The complexity increases as you have more GCP projects, folders, orgs. But, of course, if you don't do all this, GCP will fight you every step of the way.
Compare that to AWS, where you just click a user, and you see what's assigned to it. They engineered it specifically so it wouldn't be a pain in the ass.
> Every organization I’ve ever witnessed eventually ends up with some kind of struggle with AWS’ insane organizations and accounts nightmare.
This was an issue in the early days, but it's well solved now with newer integrations/services. Follow their Well Architected Framework (https://docs.aws.amazon.com/wellarchitected/latest/framework...), ask customer support for advice, implement it. I'm not exaggerating when I say this is the best description of the best information systems engineering practice in the world, and it's achievable by startups. It just takes a long time to read. If you want to become an excellent systems engineer/engineering manager/CTO/etc, this is your bible. (Note: you have to read the entire thing, especially the appendixes; you can't skim it like StackOverflow)
The problem is that no company I’ve ever worked for implemented the well architected framework with their AWS environment, and not one company will ever invest the time to make their environment match that level of quality.
I think what you describe with the web tool to discover user permissions sounds a lot like the AWS VPC Reachability Analyzer which I had to live in for quite a while because figuring out where my traffic was getting blocked between an endless array of AWS accounts and cross-region transit gateways was such a nightmare that wouldn’t exist with GCP global VPCs and project/folder based permissions.
I don’t like the GCP console, but I also wouldn’t consider a lot of the AWS console to be top tier software. Slow/buggy/inconsistent are words I would use with the AWS console. I can concede that AWS has better documentation, but I don’t think it’s a standout, either.
Architecturally I'd go with GCP in a heartbeat. Bigquery was also one of the biggest wins in my previous role. Completely changed out business for almost everyone, vs Redshift which cost us a lot of money to learn that it sucked.
You could say I'm biased as I work at Google (but not on any of this), but for me it was definitely the other way around, I joined Google in part because of the experience of using GCP and migrating AWS workloads to in.
What are these struggles? The product I work on uses AWS and we have ~5 accounts (I hear they used to be more TBF) but nowadays all the infrastructure is on one of them and the other are for some niche stuff (tech support?). I could see how going overboard with many accounts could be an issue, but I don't really see issues having everything on one account.
The way to automate provisioning of new AWS accounts requires you to engage with Control Tower in some way, like the author did with Account Factory for Terraform.
Just before they announced that I was working on creating org accounts specifically to contain S3 buckets and then permitting the primary app to use those accounts just for their bucket allocation.
AWS themselves recommend an account per developer, IIRC.
It's as you say, some policy or limitation might require lots of accounts and lots of accounts can be pretty challenging to manage.
I have almost 40 AWS accounts on my login portal.
Two accounts per product, one for development environments and one for production environments, every new company acquisition has their own accounts, then we have accounts that solely exist to help traverse accounts or host other ops stuff.
Maybe you don’t see issues with everything in one account but my company would.
I don’t really think they’re following current best practices but that’s a political issue that I have no control over, and I think if you went back enough years you’d find that we followed AWS’ advice at the time.
Undersea cable failures are probably more likely than a google core networking failure.
In AWS a lot of "global" things are actually just hosted in us-east-1.
Guessing that's similar on the other clouds.
The routing isn’t centralized, it’s distributed. The VPCs are a logical abstraction, not a centralized dependency.
If you have a region/AZ going down in your global VPC, the other ones are still available.
I think it’s also not that much of an advantage for AWS to be able to say its outages are confined to a region. That doesn’t help you very much if their architecture makes architecting global services more difficult in the first place. You’re just playing region roulette hoping that your region isn’t affected. Outages frequently impact all/multiple AZs.