upvote
Who cares if it can be evaluated independently? The majority of commenters on HN were happy to vibe code and ship products with the models we had 1-2 years ago. It continues to be laughable.

I understand that moving the goalpost every release is unfair, but it's similarly concerning to consider that people were letting GPT 4.X vibe code and ship entire products.

reply
I don’t think so? They can claim it was an act of God for all I care, but at the end of the day the model failed the task.
reply