undefined

points

[-]

Perhaps not even necessarily subjective, just performance is highly task-dependent and even variable within tasks. People get objectively different experiences, and assume one or another is better, but it's basically random.

by easygenes8 hours ago|

parent|

[-]

Unless you're looking at something like a pass@100 benchmark, the benchmarks are confounded heavily by a likelihood of a "golden path" retrieval within their capabilities. This is on top of uncertainties like how well your task within a domain maps to the relevant test sets, as well as factors like context fullness and context complexity (heavy list of relevant complex instructions can weigh on capabilities in different ways than e.g. having a history where there's prior unrelated tasks still in context).

The best tests are your own custom personal-task-relevant standardized tests (which the best models can't saturate, so aiming for less than 70% pass rate in the best case).

All this is to say that most people are not doing the latter and their vibes are heavily confounded to the point of being mostly meaningless.

by operatingthetan12 hours ago|

parent|

prev|

[-]

>just performance is highly task-dependent and even variable within tasks. People get objectively different experiences, and assume one or another is better, but it's basically random.

You are right that this is not exactly subjectivity, but I think for most people it feels like it. We don't have good benchmarks (imo), we read a lot about other people's experiences, and we have our own. I think certain models are going to be objectively better at certain tasks, it's just our ability to know which currently is impaired.

by api9 minutes ago|

prev|

[-]

They might be converging somewhat. The ultimate limiting factor is training data. Eventually I think they will converge and then the competition will be on memory and compute efficiency, with the best being the smallest maximally capable model.

by hamdingers10 hours ago|

prev|

[-]

And the subjectivity is bidirectional.

People judge models on their outputs, but how you like to prompt has a tremendous impact on those outputs and explains why people have wildly different experiences with the same model.

by ulfw6 hours ago|

prev|

[-]

AI is a complete commodity

One model can replace another at any given moment in time.

It's NOT a winner-takes-all industry

and hence none of the lofty valuations make sense.

the AI bubble burst will be epic and make us all poorer. Yay

by StilesCrisis59 minutes ago|

parent|

[-]

Staying power is probably the most important factor, which is why I'm thinking Google eventually takes the crown.