undefined

points

by spangry15 hours ago |

comments

by Davidzheng11 hours ago|

[-]

Human user usage data is probably a tiny contribution to improvement of the models--it's mostly RL on environments

by pksebben15 hours ago|

prev|

[-]

> Perhaps the US administration is gambling that US citizens on their own provide enough of a training data and revenue flywheel for them to keep their AI development edge.

There is no way to enforce access of one and not the other, not with the state of tech in the US (and most countries without a great firewall). Bypassing such controls is as easy as a pilfered credit card (or some other american-looking payment method) and a vpn - both trivial to come by.

by gmueckl15 hours ago|

parent|

[-]

It may not be perfect, but this hurdle would still keep out ~99% of the targeted people.

by pksebben13 hours ago|

parent|

[-]

Genuinely curious - who do you think the targeted people are and how would this keep them out?

by gmueckl13 hours ago|

parent|

[-]

For the sake of this discussion, I'm going with the nationalistic vibe of the order: anybody who isn't a citizen of the USA (presumably to limit risk of AI-supported action against the US?).

But that in itself is telling in a way: if national security was a true concern, access should be limited to people who passed background checks.

by pksebben2 hours ago|

parent|

[-]

Right - it doesn't hold up to scrutiny. For one, "not a citizen" is a pretty hard bar to assess online. For another, "citizen" isn't very meaningful here. Many national security incidents have featured a citizen at the core - and it's a really fuzzy indicator of "potentially hostile" and especially "for what reason".

I guess I'm possibly giving them too much credit, but if the people who sent the letter have their head screwed on straight, "protecting national security by disallowing specifically non-citizens from using it" can really only be read as a smokescreen, or at very best a small part of the actual picture.

by asp_hornet15 hours ago|

prev|

[-]

> the more people you have using your models, the more training and fine-tuning data you're accumulating, so the faster you can develop the next frontier model

I’ve wondered this but then wouldn’t a large amount of input now just be AI output from a previous PR/client email/spec document/chat. Training of that would be an issue leading to distillation?