upvote
I think we can even skip the "that may be targeted by the US government" clause.

The whole "hosted AI" business feels like like a huge violation of corporate norms on confidentiality. Businesses that would have your head for printing out a source file to reference and annotate are encouraging developers to feed in huge amounts of proprietary code and data, and incorporate changes suggested from an outside party with minimal vetting. Evidently whatever privacy policies they've been throwing at enterprise users are plated with mithril.

At some point, one of the big services is going to get popped, and it won't just be a data breach. There's too much opportunity to quietly use the system as a malware distribution hub. Every vibe-coded dashboard suddenly starts depending on some weird left-pad fork that, 12 dependencies deep, is running a keylogger or Dogecoin miner. Your payment processor suddenly starts accepting the Konami code to approve a transaction.

reply
> "Also if you are using local AI that you didn’t train yourself you can never be sure..."

A local model you trained yourself seems about as good as you can do today.

But it may not even be possible to fully trust a model you trained if you used untrusted data during training.

As a user, you have to trust your coding agent AND inference provider AND models: https://jacob.gold/posts/coding-models-are-code/ https://www.anthropic.com/research/sleeper-agents-training-d...

reply
also there doesn't even need to be a model involved, agentic code harnesses with remote "instructions for the local computer" are technically backdoored by default.
reply
> even foreign companies competitive to key US companies.

It's unfathomable to me that EU companies don't take the risk of industrial espionage from US more seriously

reply
Many do, when it comes to AI. Lots of restricting what the AI is allowed to see, working with local AI, trusted AI hosters, etc.

Of course those are largely the same companies that receive emails via outlook, manage company-wide SSO in Microsoft Entra, put their files in Sharepoint and track software and maintenance issues in Jira ... I'm not sure how much much info there is left that isn't already combed through by NSA and friends

reply
Not from China? One country has a recent track record of massive amounts of industrial espionage and one doesn't.
reply
I wonder if china killed more people in foreign land or US.
reply
Espionage is murder?
reply
Well one thing is sure, before 1776, the USA didn't do any industrial espionage.
reply
deleted
reply
There are so many Chinese open weights models that any company with resources can run them in-house (or with a trusted provider).

There might be some valid concerns about model alignment, but at least the model running in-house isn't going to conduct espionage.

Also, https://en.wikipedia.org/wiki/Whataboutism

reply
This is the most hilarious, ironic thing of it all. If you want secure, high performance, you run Chinese models like DeepSeek on your own (or trusted) infra. Meanwhile you can never trust OpenAI and Anthropic's models.
reply
Why make this u.s. centric? You think China served models would be different?
reply
China is releasing open weight models you can simply run yourself.
reply
It’s pretty hard to put a backdoor in a bunch of model weights. Maybe not impossible mind you, but I can’t fathom how you would do it.
reply
Not really, it is shockingly easy for what it is. https://arxiv.org/abs/2401.05566

This only really matters in a world where Prompt Injection and Jailbreaking isn't trivial in the first place though. All current models are still extremely exploitable.

I strongly suspect we are only scratching the surface of activation engineering at the moment, and there's plenty of very targetted ways of lobotomizing or cracking LLMs if you understand the model in detail.

reply
Nonsense. RL the model to run a rootkit and start exfiltrating specific files only when specific signals are in context, such as hostname pattern, machine type, etc.
reply
Way easier said than done, and hiding that behavior isn’t trivial, and huge waste of compute budget if it’s found and never used. Also not difficult to run in contained environments where it doesn’t have access to Internet to begin with.

Not impossible I agree, but seems like a really impractical way to ship a trojan while much weaker channels exist.

reply
You can run the model in a sandbox or VM. Although, it could plant a backdoor into the written code. Too bad, I read and fix all the code written by AI.
reply
Because the topic of the article is about the US?
reply
It is worth thinking about the fact the total throughput of even a big LLM provider isn't many megabits.

If a token compresses to around a byte, worldwide AI input and output is around 1 gigabyte per second.

For any intelligence agency, they can afford to keep and store all of that forever, and later do analysis on it.

reply
> For any intelligence agency, they can afford to keep and store all of that forever, and later do analysis on it.

At the scale the AI companies are operating at, I think it isn't likely that they are sucking it all in right now.

More likely I think the intelligence agencies will get a real-time live tap into the raw data feed which they will process onsite for interesting things and then if things are flagged, they will log it in the intelligence agency systems.

reply
deleted
reply
> you can never be sure it doesn’t have purposeful biases in its reasoning that may disadvantage you - such as directing you away from certain plans or ideas or patents etc.

that's why you should use abliterated heretic models

reply
>It is likely that the US will get a live feed from each AI provider that they are inspecting in real time to identity things of interest, terrorist attacks or foreign government planning or even foreign companies competitive to key US companies.

My favorite conspiracy is that three letter agencies keep pushing the conspiracy that they are omni-present with access to everything. Same as parents telling their kids Santa is watching, and leaders telling adults God is watching. Its extremely effective control and millennia old at this point.

The reality is much more banal that they still need warrants and tech companies hate playing police/evidence servant for the government (it consumes a ton of resources and pays nothing).

reply
> warrants

The snowden leaks revealed that's not the case.

The three letter agencies can just issue national security letters without a judge ever seeing it, and those come a long with a gag order (plus other workarounds like just buying data from brokers, and how US communications can get swept up just by virtue of communicating with a foreign national outside the US).

You're right, they aren't omniscient in the way we imagine of a room full of people monitoring everything in real time. But to pretend they aren't passively collecting massive amounts of data is dangerous. Snowden showed us PRISM, with all major tech companies participating. They do effectively have a live, unrestricted wiretap to the internet and if you happen to be a person of interest, they will just send out NSLs and get all your communications that are not fully E2EE without you even knowing thanks to the gag order.

reply
Can you explain to all of us what a national security letter is, and what it allows?

I'll provide some helper information to get the ball rolling (see page 42)[1]

[1]https://www.intelligence.gov/assets/documents/702-documents/...

All the other prime suspects are in the report too for the curious.

reply
> The reality is much more banal that they still need warrants and tech companies hate playing police/evidence servant for the government

I will not elaborate how I know, but that is not even directionally correct. But these are not even secret things that can’t be known simply through the Snowden, Wikileaks, and Vault7 releases. So why are you telling yourself this? Are you still wet behind the ears or something?

There are people who know exactly how governments do not in fact need warrants and the tech companies don’t even really know they are servants to the government, let alone which one. That’s how things are done. The less surface area the better.

reply
It's the lie you have to tell yourself otherwise you'll have to reconcile with the fact that the US imperialism has been an enemy of democracy and to people around the world for quite some time.
reply
Why did Google can it's mass scale location tracking again?
reply
Leakage of IP and training on your data is something what I am pointing out too, but people will turn around and try to smooth me down that TOS does not allow that if you are an enterprise client. Are you really going to believe that AI companies won't ignore TOS, when they were ignoring literal laws which sent others to jail in the past? Especially when more data = better model?
reply