upvote
Some internal perspective - IAM has maybe thousands of options but fundamentally it is "what does this role have access to doing (action + resource)" + "who has access to this role". That is really it from a 10k foot level.

IAM is great because it applies internally just like it does externally. The internal AWS team don't get more access than you do, and if we get access to do certain thing on your account to perform specific service that's because you have a service principle in your IAM trust relationship that allowed us access, that you can see, and audit. For instance, lambdas have lambda role because you don't want lambda service just reading your S3 buckets because "we're AWS we automatically get access", you can absolutely see and control access, even if it is internal to AWS.

reply
> but fundamentally it is "what does this role have access to doing (action + resource)" + "who has access to this role". That is really it from a 10k foot level.

Hahahaha. No, fundamentally it is one input into a huge mess that you cannot actually see or audit from a 10k foot level.

AWS has produced a long, rambling and imprecise description of (some of?) what’s actually going on. You can read it here:

https://docs.aws.amazon.com/IAM/latest/UserGuide/access_poli...

Some of what they’re describing doesn’t even live within the IAM umbrella as far as I can tell. I’m not convinced that a concise, formal and unambiguous specification exists anywhere, even within AWSes own development teams.

I’ve asked LLMs to write AWS “policy”. They get the grammar mostly right. They cannot explain what the effects are in a manner that they will stand by after they search the web for documentation. Since I have never found good documentation despite looking, I can’t personally do any better than the LLMs. I’d love to be pointed at real documentation or specs.

reply
They are just some slight variation of the fundamental idea. For example resource policy and org SCP are just the same check on a different level (e.g. more of who has access to what). They are attached to Organization and individual resource respectively (vs Account) so they need to exist in a separate place. And then in use they are ALL checked before an access is granted.

I don't work for IAM but I worked for several other teams over the years and IAM is actually one of the least confusing services. But I am definitely biased and have more than average amount of experience on this particular subject. I still think the general idea is more sane than Azure Account for example. I do think this reflect on the philosophical level where whether cloud are building blocks or are they consulting projects. I personally think IAM is done right in that regard.

reply
> And then in use they are ALL checked before an access is granted.

I know they’re all checked. What don’t know is how the results of those checks are combined to get the final result. As far as I can tell, the result is not something like OR or AND — it seems like it’s something exceedingly complex and that the output of the policy part may be more complex than just a Boolean value.

Maybe the underlying implementation is fantastic (and my distinct impression is that AWS takes this stuff far more seriously than Azure), but that doesn’t mean that the docs are easy to find or that the system actually makes sense in anything other than an agglomeration-of-backwards-compatible-layers sense.

reply
One thing i would like to see in IAM would be sonething like verb actions, currently, if you want to give least privilage, you have to trial and error your api call until you get it right. Since aws have a very good api definition on all consumers (rest, aws-cli, boto uses same strucyure), i think it would be doable.

I mean something like actions: s3:cp Resource: bucketarn/key

Most of the time, actions are self explanatory and good enough, but i recently gave a developer permission to scale an asg, and it required a lot of unguessable actions, if i were to give "actions: scale" (forgot the correct cli parameter for it), it would make more clean env

reply
> that's because you have a service principle in your IAM trust relationship that allowed us access

That’s why it’s so complicated!!!

I don’t understand how I should evaluate trust for your internal EBS org versus your internal ALB org.

I kinda just expect it to be all “AWS” trust.

And it’s all garbage anyway. There’s no way I can prevent the hypothetically untrustworthy EBS team from surreptitiously adding charges to my account if they want to. Right? This would maybe make some sense if I could top level turn off/on services, but that isn’t how it works.

I have no doubt this makes some sense from someone inside the machine, but from the outside it’s not helpful nor useful.

reply
3 things to untangle here.

1. It's about trust and auditability, while you may not want or need it, there are a lot of customer that are either interested or legally obligated to know who have accessed certain data.

2. It's about dogfooding - how would you trust an identity and access system when the company does not even use it internally?

3. In general, there are quick buttons and template to do it if you don't want to worry about it, in the LLM age, this gets easier. Personally I prefer this because I intensely dislike "magic". This allow you to control, to the maximum degree possible, what is actually going on, despite not owning any of the physical aspect of the data center.

reply
1. It's about imposing worst-case complexity on the 99% of people who will never benefit. 2. Some of that complexity only arises because of the dogfooding 3. No it doesn't get easier, because you still need to understand what those things actually do to know if they're right for your use case, and besides if you're driving everything from terraform then having a "quick button" is precisely useless.

We had an AWS rep try to sell us on an AI tool to help with predicting the IAM permissions that our infrastructure code needs. My response was, essentially, "why have you built a deterministic system so complicated that it needs an AI to configure correctly?" I have not had an answer.

reply
That's all fine and good, but I still don't know how much trust the EBS team versus the ALB team.

And I don't think you do either.

reply
>I kinda just expect it to be all “AWS” trust.

This would be very unwise from security standpoint. Internal access to customer stuff is granular and made hard for internal staff to gain, to minimize chances of screw up intentional or not.

reply
I agree. Adding a service principal always raises an eyebrow for me, just a blanket "hey we're aws trust me bro" is a little bonkers.
reply
How does this work in Azure and Google Cloud?
reply
IAM is unnecessarily bad. I recently had to set a trivial policy, and was doing it correctly.

The console kept warning me that I was giving root AWS access to my external application because they want people to use the locked in AWS path, and I was running off cloud.

On top of that, they break copy paste on the web console, so you can’t just ctrl-c ctrl-v and then ask Claude to explain their WTF-ery. Instead, you have to OCR or send a PNG.

I honestly did not think they could make IAM worse, yet here we are. Bastards.

reply
You think that you're complaining about IAM, but really you're complaining about the web ui. I rarely use the web console, I use terraform or the cli. I'd you're vibe coding your infra with Claude, point it at the cli / terraform. Skip the ui.
reply
With terraform you get the amazing experience of having to iterate, one at a time, through the five hundred and thirty seven new permissions you need to grant having decided that a lambda configuration needs to be ever so slightly different than it was yesterday, because there's no documentation linking terraform creation of resources and the IAM permissions required to successfully make the AWS API call behind the scenes. Or those for updating a resource, which are different, so you get to do it all again tomorrow. Or deleting - different again. Fun for the day after.
reply
You could probably open the developer tools, find the console elements and extract the data from there to get around copy/paste limitations. I’m not familiar with the AWS console but let’s say it’s an input, select it in the dev tools and then in the dev tools console do $0.value
reply
I guess I should also point out that I’ve used AWS at extremely large scale in the past, which is why I’m running this subproject on another cloud.

As for simple permissions, go read the UNIX paper. It spends a page or two on their approach and is all you need.

Then, read the paper on mapping between NTFS SMB ACLs and NFS. It’s either impossible or undecidable, depending on the deployment. IAM is from the windows acl lineage which is known pessimal from a usability and security perspective.

reply
IAM is NOT from any lineage. It has grown organically and is complicated, just as any other policy language. AWS even uses an automatic proof assistant to verify IAM policies.

However, the secret to IAM in AWS is to NOT use IAM. Just create separate AWS accounts for separate services and only share whatever resources are needed. Then you can have dead simple IAM policies because you won't need to do granular permissions ("AWS role X can access database Y").

reply
> Just create separate AWS accounts for separate services

My understanding is that different AWS accounts have different mappings of availability zones, so it's very easy to suddenly find yourself with an unexpected bandwidth bill due to all the cross-az traffic.

I've been irritated at AWS (and the other large cloud providers) that they charge $0.01/GB for cross-az traffic. That's $3.24/Mbps -- about the same I was paying for internet transit (as in: from London to anywhere in the world) 20 years ago, and this is just between two datacenters in the same city controlled by the same organisation, markup must be 10,000x or more considering these places are cross-connected with massive bundles of fiber!

reply
Agreed on IAM, and TFAs comments that once you see the horrendous complexity in IAM you start seeing it everywhere else in AWS as well. And with IAM, after all the effort you've put in, you can never really tell what is and isn't enabled. If you run your own server you can check permissions, run access-control audit scripts, and so on, and say with a pretty good level of confidence that X is possible and Y isn't. With IAM it's more like "I'm pretty sure I figured out the right silly-walk for X, but I have no way to tell what else might be enabled".

AWS: I came, I saw, I threw up in my mouth a little, I left.

reply
"who fought against Kubernetes adoption because it was "too complex", only to slowly reinvent Kubernetes badly,"

If you are dynamically scaling a set of web services sure. The problem is that people use k8s for running batch pipelines and streaming analytic services and a bunch of other things too. And k8s is terrible at doing those things and entirely too complex. And if you don't have to scale your web services very often, then k8s is a waste in that case too. Its a right tool for the job and k8s's job isn't deploying to the cloud, its dynamically scaling a website.

reply
> fought against X adoption because it was "too complex", only to slowly reinvent X badly

This is a surprisingly common pattern in technology and software. Some things are definitively the “standard” at this point yet so many people simply refuse to spend the time to properly learn them.

reply
> This is a surprisingly common pattern in technology and software. Some things are definitively the “standard”

It is also a surprisingly common pattern to adopt very complicated solutions for applications that are never going to need them

ultimately it is not possible to come up with a "standard" that is an acceptable replacement for good judgement

reply
Another point is that while it can be more expensive than self hosting, the savings are dwarfed by the engineering costs. A decent infrastructure engineer working for 2 man months on your “money saving” ovh setup costs you more than you can possibly save by not just using fargate or rds whatever.
reply
How much would you pay for 2 months of infrastructure engineer time? And how many millions / tens of millions / hundreds of millions are you imagining being spent on overpriced AWS services?

(Also, those AWS services are not engineering-free. I tried to migrate a system to RDS once and gave up after quite a few hours when I got to the part of the documentation that suggested that I edit my sql dump using sed to get it into a form that RDS would accept. No, thanks.)

reply
But unless you're on a PaaS, you have "infrastructure engineers" already. So why not at least let them make back their salary by making them built a cost-efficient infrastructure?
reply
This is rarely actually true but it's a common falsehood told by people who have financial interest of keeping everyone on AWS.

And that includes engkneers that only know how to use AWS and are terrified at having to learn something else.

reply
I think the big problems with amazon IAM is not that it’s inherently complex, it’s that every team in AWS came up with their own way to define permissions and the calls these allow you to make. So the API Gateway set of permissions uses a completely different method for no discernable reason.
reply