upvote
I think it often useful to push the conversation down "we built a system for humans that dealt with this, what from that is or is not applicable for agents in the same context"? Humans randomizing resume review for screening is pretty known; I've seen companies try to fight it with things like hiding information, panel reviews, etc - it's unclear to me how effective those would be for agents (honestly, it was unclear how effective those were for humans). I was depressed about the hiring process before we had AI screening and I remain depressed about it.
reply
It may seem trite but the point is that if separate humans were assigned the same task the LLM was here the results would be similarly non-deterministic.
reply
Indeed: LLMs do tasks that would otherwise be assigned to humans. So when pointing out deficiencies in LLM performance they should be compared to the alternative, which also isn't perfect.
reply