upvote
New technology isn't perfect now -> drop technology and never use it in the future
reply
I think eventually you are going to end up with every smart AI continually checked by dumber AI's to make sure they don't do anything too crazy. Which probably does bring AI closer to how human intelligence works
reply
Completely agree, top down “alignment” and RLHF is actually quite primitive and uses a lot fancy words to describe what is essentially just hitting the machine with a stick without the nuance, context, or feedback to help it model why the feedback was given.

Also to be honest I think OpenAI models struggle a lot with this, I primarily stopped using them in the sycophancy/emoji era but ever since the way they talk or passive aggressively offer to do something with buzzwords just pisses me off so much. Like I’m constantly being negged by a robot because some SFT optimized for that really strongly to the point it can’t even hold a coherent conversation and this is called “AI safety” when it’s just haphazard data labeling

reply