upvote
That’s a separate issue and specifically not what OP was describing. Also highly unlikely in practice unless you use a random LLM - the major LLM providers already have to deal with such things and they have decent techniques to deal with this problem afaik.
reply
If you think the the major LLM providers have a way to filter malicious (or even bad or wrong) code from training data, I have a bridge to sell you.
reply
No not to filter it out of training but to make it difficult to have poisoned data have an outsized impact vs its relative appearance. So yes, you can poison a random thing only you have ever mentioned. It’s more difficult to poison the LLM to inject subtle vulnerabilities you influence in code user’s ask it to generate. Unless you somehow flood training data with such vulnerabilities but that’s also more detectable.

TLDR: what I said is only foolish if you take the absolute dumbest possible interpretation of it.

reply