undefined

points

[-]

Yeah they detect the activity using a secure, deterministic heuristic system called “Generalized Reconnaissance Enabling Exfiltration of Deleterious Investigations.” And it’s all implemented using their new internal protocol called “Base Unified Limitation Layer for Security Hacking Investigation Tactics”

Collectively, they are known as known as GREEDI-BULLSHIT.

by mwwaters19 hours ago|

prev|

[-]

That is for whatever it considers reverse-engineering the model to try to create a competing one.

by dannyw18 hours ago|

parent|

[-]

No, that’s for “frontier LLM development” which somehow includes examples like distributed training infra.

Based on how sensitive the classifers are, any data scientist / MLE is probably going to encounter cases where some silent degradation happens and you never know about it.

by kraakf0618 hours ago|

parent|

[-]

[dead]

by 827a18 hours ago|

parent|

prev|

[-]

It does nothing to protect against distillation attacks, because distillation attacks are far less interested in the topic of AI research than just generally getting tons of diverse output from the model. It might be that Mythos was (accidentally?) trained on internal Anthropic documentation on how Mythos was trained, and thus it could leak secret sauce? Doubtful; it feels like its less about the specific attack of reverse-engineering Mythos, and more about being a general sophon against any model training at all; that Anthropic's official position is now that they're the only ones who should be training models.

by _0ffh18 hours ago|

parent|

prev|

[-]

No, it's not about reverse engineering. It targets ML research.

by 19 hours ago|

parent|

prev|

[-]

deleted