undefined

points

by nomel19 hours ago |

comments

by mips_avatar19 hours ago|

[-]

Anthropic is trying to hide bad behavior by being vague, it's important to not be vague when calling it out.

by nomel19 hours ago|

parent|

[-]

I'm of the opinion that removing guardrails is how you force regulation. What's your opinion on the balance?

by dannyw18 hours ago|

parent|

[-]

They have all transcripts for at least 30 days. The problem is that (as anyone who used Fable can attest) their classifiers are extremely sensitive and catch tons of innocent queries.

Imagine being a data scientist or MLE training a small classifier model. How do you know you won’t get steering vectors or a PEFT applied?

by nomel17 hours ago|

parent|

[-]

Since your answer isn't direct, I'm having a little trouble interpreting it.

Are you saying they should relax guardrails since they have 30 days to know if you produced something bad? If that is what you're saying, then I suspect they chose their current path to prevent, since you can't un-produce. Producing is what would cause regulations/PR problems.

by dannyw14 hours ago|

parent|

[-]

Sorry, I’m specifically referring to the silent degradation of the model to “limit frontier LLM development”. From the description, it appears to encapsulate far more than frontier LLM development, but general ML research and development too.

Those cases are never bad for the world firstly, and a broad coverage of ML work is even more damaging.

My proposal would be (1) don’t degrade models, with 30D retention I’m sure they can do a reasonable job at banning deepseek or whatever, or (2) surface user facing refusals instead of silently degrading ML work.

by mips_avatar16 hours ago|

parent|

prev|

[-]

They’re not safety guardrails they’re anthropic doesn’t like anyone who isn’t anthropic working on AI rails

by giancarlostoro19 hours ago|

prev|

[-]

PEFT is a library, one of its capabilities is to produce LoRAs.

See:

https://heidloff.net/article/efficient-fine-tuning-lora/

by adw19 hours ago|

parent|

[-]

It's just an acronym, "parameter-efficient fine tuning". LoRA is one method, prefix tuning is another, there are more.