IAM is great because it applies internally just like it does externally. The internal AWS team don't get more access than you do, and if we get access to do certain thing on your account to perform specific service that's because you have a service principle in your IAM trust relationship that allowed us access, that you can see, and audit. For instance, lambdas have lambda role because you don't want lambda service just reading your S3 buckets because "we're AWS we automatically get access", you can absolutely see and control access, even if it is internal to AWS.
Hahahaha. No, fundamentally it is one input into a huge mess that you cannot actually see or audit from a 10k foot level.
AWS has produced a long, rambling and imprecise description of (some of?) what’s actually going on. You can read it here:
https://docs.aws.amazon.com/IAM/latest/UserGuide/access_poli...
Some of what they’re describing doesn’t even live within the IAM umbrella as far as I can tell. I’m not convinced that a concise, formal and unambiguous specification exists anywhere, even within AWSes own development teams.
I’ve asked LLMs to write AWS “policy”. They get the grammar mostly right. They cannot explain what the effects are in a manner that they will stand by after they search the web for documentation. Since I have never found good documentation despite looking, I can’t personally do any better than the LLMs. I’d love to be pointed at real documentation or specs.
I don't work for IAM but I worked for several other teams over the years and IAM is actually one of the least confusing services. But I am definitely biased and have more than average amount of experience on this particular subject. I still think the general idea is more sane than Azure Account for example. I do think this reflect on the philosophical level where whether cloud are building blocks or are they consulting projects. I personally think IAM is done right in that regard.
I know they’re all checked. What don’t know is how the results of those checks are combined to get the final result. As far as I can tell, the result is not something like OR or AND — it seems like it’s something exceedingly complex and that the output of the policy part may be more complex than just a Boolean value.
Maybe the underlying implementation is fantastic (and my distinct impression is that AWS takes this stuff far more seriously than Azure), but that doesn’t mean that the docs are easy to find or that the system actually makes sense in anything other than an agglomeration-of-backwards-compatible-layers sense.
I mean something like actions: s3:cp Resource: bucketarn/key
Most of the time, actions are self explanatory and good enough, but i recently gave a developer permission to scale an asg, and it required a lot of unguessable actions, if i were to give "actions: scale" (forgot the correct cli parameter for it), it would make more clean env
That’s why it’s so complicated!!!
I don’t understand how I should evaluate trust for your internal EBS org versus your internal ALB org.
I kinda just expect it to be all “AWS” trust.
And it’s all garbage anyway. There’s no way I can prevent the hypothetically untrustworthy EBS team from surreptitiously adding charges to my account if they want to. Right? This would maybe make some sense if I could top level turn off/on services, but that isn’t how it works.
—
I have no doubt this makes some sense from someone inside the machine, but from the outside it’s not helpful nor useful.
1. It's about trust and auditability, while you may not want or need it, there are a lot of customer that are either interested or legally obligated to know who have accessed certain data.
2. It's about dogfooding - how would you trust an identity and access system when the company does not even use it internally?
3. In general, there are quick buttons and template to do it if you don't want to worry about it, in the LLM age, this gets easier. Personally I prefer this because I intensely dislike "magic". This allow you to control, to the maximum degree possible, what is actually going on, despite not owning any of the physical aspect of the data center.
We had an AWS rep try to sell us on an AI tool to help with predicting the IAM permissions that our infrastructure code needs. My response was, essentially, "why have you built a deterministic system so complicated that it needs an AI to configure correctly?" I have not had an answer.
And I don't think you do either.
This would be very unwise from security standpoint. Internal access to customer stuff is granular and made hard for internal staff to gain, to minimize chances of screw up intentional or not.
The console kept warning me that I was giving root AWS access to my external application because they want people to use the locked in AWS path, and I was running off cloud.
On top of that, they break copy paste on the web console, so you can’t just ctrl-c ctrl-v and then ask Claude to explain their WTF-ery. Instead, you have to OCR or send a PNG.
I honestly did not think they could make IAM worse, yet here we are. Bastards.
As for simple permissions, go read the UNIX paper. It spends a page or two on their approach and is all you need.
Then, read the paper on mapping between NTFS SMB ACLs and NFS. It’s either impossible or undecidable, depending on the deployment. IAM is from the windows acl lineage which is known pessimal from a usability and security perspective.
However, the secret to IAM in AWS is to NOT use IAM. Just create separate AWS accounts for separate services and only share whatever resources are needed. Then you can have dead simple IAM policies because you won't need to do granular permissions ("AWS role X can access database Y").
My understanding is that different AWS accounts have different mappings of availability zones, so it's very easy to suddenly find yourself with an unexpected bandwidth bill due to all the cross-az traffic.
I've been irritated at AWS (and the other large cloud providers) that they charge $0.01/GB for cross-az traffic. That's $3.24/Mbps -- about the same I was paying for internet transit (as in: from London to anywhere in the world) 20 years ago, and this is just between two datacenters in the same city controlled by the same organisation, markup must be 10,000x or more considering these places are cross-connected with massive bundles of fiber!
AWS: I came, I saw, I threw up in my mouth a little, I left.
If you are dynamically scaling a set of web services sure. The problem is that people use k8s for running batch pipelines and streaming analytic services and a bunch of other things too. And k8s is terrible at doing those things and entirely too complex. And if you don't have to scale your web services very often, then k8s is a waste in that case too. Its a right tool for the job and k8s's job isn't deploying to the cloud, its dynamically scaling a website.
This is a surprisingly common pattern in technology and software. Some things are definitively the “standard” at this point yet so many people simply refuse to spend the time to properly learn them.
It is also a surprisingly common pattern to adopt very complicated solutions for applications that are never going to need them
ultimately it is not possible to come up with a "standard" that is an acceptable replacement for good judgement
(Also, those AWS services are not engineering-free. I tried to migrate a system to RDS once and gave up after quite a few hours when I got to the part of the documentation that suggested that I edit my sql dump using sed to get it into a form that RDS would accept. No, thanks.)
And that includes engkneers that only know how to use AWS and are terrified at having to learn something else.