undefined

points

[-]

That’s a separate issue and specifically not what OP was describing. Also highly unlikely in practice unless you use a random LLM - the major LLM providers already have to deal with such things and they have decent techniques to deal with this problem afaik.

by what18 hours ago|

parent|

[-]

If you think the the major LLM providers have a way to filter malicious (or even bad or wrong) code from training data, I have a bridge to sell you.

by vlovich12317 hours ago|

parent|

[-]

No not to filter it out of training but to make it difficult to have poisoned data have an outsized impact vs its relative appearance. So yes, you can poison a random thing only you have ever mentioned. It’s more difficult to poison the LLM to inject subtle vulnerabilities you influence in code user’s ask it to generate. Unless you somehow flood training data with such vulnerabilities but that’s also more detectable.

TLDR: what I said is only foolish if you take the absolute dumbest possible interpretation of it.