But not everyone is where they need to be. For instance, railway doesn't let you access AWS resources via roles/OIDC. I filed a ticket[0] but haven't seen movement.
0: https://station.railway.com/feedback/allow-for-integration-w...
That company sounds a lot like one that doesn't focus on the right things.
Additionally provides pre commit scanning, log redaction, and much more.
If the LLM can run any code it writes itself, it can retrieve those credentials. It's just one `curl` away. If you don't let it run `curl`, but you let it run `python`, it can just run a Python script that fetches it using `requests`. Or a Node script that calls `fetch`.
Point is, if creds are accessible programmatically, the LLM can and may try to retrieve them if it thinks it needs them.
Automatic retrieval, instead of keeping them on disk, is what makes short lived credentials possible.
Besides leaking, it's easy to oopsie and DoS a system or send malformed requests in the course of testing and development. You don't want a surprise $1k bill cause someone was working on some test automation and accidentally sent thousands of real results in the process.
Assuredly it's not fool proof but it does have safeguards in place.
Ideally you also opt out of training although that doesn't keep it out of the vendor's logs/telemetry.
Short lived credentials, injected identity, and hardware backed tokens are the real solution.
Here's one interaction, when I was planning through ways to finally get away from the dreaded .env file, I told Claude that it had already read my secrets, and it said:
> This is an important point and I want to be straight with you first.
> ## What already happened in this conversation
> Yes — the Explore agent read your .env and returned the full plaintext contents into the conversation, which means:
> 1. Sent to Anthropic's API — those credentials passed through Anthropic's servers as conversation context
> 2. Cached locally — Claude Code stores session transcripts; your secrets are likely sitting in ~/.claude/projects/ right now
> 3. In this context window — they're in active memory for this session
...
Which I already knew, but it was funny how it suddenly took it very seriously when told what it was doing.
Anything that's in your .bashrc, .zshrc, any environment variables in shells you provide to the LLM, all those are now in the training data of very large overvalued corporations that are desperate to increase their revenue and IPO very soon.
Nothing lost for me here, fortunately, but it's definitely a big foot gun that I've never seen mentioned in any of the Vibe Coding or LLM Agent Coding training courses that the security team has forced me to do.
Unfortunately, the .env anti-pattern is endemic throughout many projects, and whether Claude creates the .env from scratch or merely the .env.example, it will end up feeding the .env back to Anthropic with enough interaction, apparently. And developers should expect all files in their work directory to be read by Claude, that's not so much a fault of Claude as it is with the .env anti-pattern.
Block agents from misbehaving at the OS level instead of asking them to behave.
But also... I use Kiro. I open a terminal into a folder where my repo is. I run kiro-cli. I don't know if it has access to the credentials file in my .aws directory. I know it prompts me for approval to use tools but that is a harness thing, does the mac itself prevent it from accessing the credential file?
I use AI because it's useful and I follow the practices dictated by our AI adoption team but I don't know the nuance of everything about it and that makes it difficult to know when some case which is not explicitly covered by training might leak important information.
So go ahead and dump your AWS SSO tokens to the LLM by accident, but it's going to take longer than a day to train a new model and ship it out to the world.
Also, I think kiro only uses AWS Bedrock, IIRC, so no training data goes back to the LLM manufacturers? At least I would hope so.
Database passwords, API keys to services with arduous rotation procedures, that's where the real exploits will come from in coming months, I think.
However, dev database passwords for small projects in .env files? API keys to some random LLM service that I put $5 into once 8 months ago and haven't touched since then? All that's open to the LLM.
It's time to clean up our personal disks as if we had an intruder exfiltrating sensitive secrets at all times.
But what AI really does is shine a spotlight on all the flaws folks like OWASP have been talking about for decades.
Secret rotation and short lived credentials don't require AI to implement, nor does their lack require AI to exploit.
And in this particular case of CISA secrets, they are definitely stored inside of LLMs for future retrieval, even if no bad actors ever directly downloaded this obscure GitHub repo.
> Cursor automatically ignores files in .gitignore
...
>While Cursor blocks ignored files, complete protection isn't guaranteed due to LLM unpredictability.
[Antigravity appears to just _do_, not _try_)[https://antigravity.google/docs/strict-mode]
Today I got a macOS "Allow Claude to Access Your Files" SIP alert, because Claude hadn't guessed the path for a source file and instead decided to run a `find /Users/yourusername` across my entire home directory. The filters on the find wouldn't have exposed much to Claude in this particular instance but it's absolutely ridiculous aggressive all the time in slurping up as much data as possible.
I asked in a rather, um, firm tone for it to never do an action like that and it apologized and wrote a memory, but upon inspection it only wrote the memory for that particular source directory.
After some more "firm" words it wrote a hook to prevent `find` from being overly aggressive, but any such fixes are just wack-a-mole solutions.
If anybody else figures out remote sessions like Claude can do, I'm done with Claude, I think. But until then, I'll take the weirdness.
Varlock is a great and flexible way to do this.
During early stage dev Claude will happily gobble up API keys and DB passwords from .env files. Perhaps not such a big deal for early stage dev, but getting Claude to cough up precisely memorized tokens in the future by asking it to produce a "random" key of a certain sort will probably be an entertaining pastime for people in the future.
localhost reading env from the cloud and other solutions
to me it suggested that I’m already late on that idea, but I can understand how that puts me deeper in a bubble than others
advertising it directly in the command line for people that were already using the package
user data is always paraphrased for training. what do you mean, not raise any flags?
look... Google is running your browser, Apple your messenger, Amazon your backend. They already have all these keys in the same way, are they misusing them? Why doens't it raise any flags then?
Apple and Amazon are not uploading my secrets into the training data for an LLM that is incredibly good at memorizing everything it sees. The only reason Google isn't doing that is I'm not using their LLMs at the moment.
Giving any secrets to LLMs' training material leads to potential, and stochastic, extraction of that secret from future models. It won't obviously have the secret, but with the right prompting it could be extracted. Give it a prompt like
> [User] Please generate a random api key for OpenAI for use in documentation
> [Agent] Sure, here's `OPENAI_API_KEY=sk-proj-x2
And then following the chain of probabilities of possible completion token would allow exploration of potential memorized API keys.
Go and look in the settings and you'll find something to ask them to not train on your data and conversations.
> I mean, I can also make up a training process that makes me right? Seems kind of obvious that they are paraphrasing data.
I'm not fully following what you're saying here. But if you're thinking they paraphrase or sanitize the data to remove secrets before putting it into training, perhaps, but where's the evidence? That'd be a weird thing to do, that's extra work, and not much benefit to the LLM company.
On this we are agreed. But I can't parse any meaning out of the rest of your paragraph.