undefined

points

[-]

Near-term acquihires are certainly a likely bet I think. But given model progress on related benchmarks like kernelbench [1], I do think a set of more commoditized solutions is also inevitable.

The caveat though is that each new gen of hardware often comes with brand new constraints/features that a given generation of models haven't seen before (e.g. tcgen05 in blackwell was OOD at one point). As the models start to generalize better, this might not be a showstopper, but still an issue at least currently.

[1] https://kernelbench.com/

by connicpu1 hours ago|

prev|

[-]

When you run CUDA at scale dealing with nvidia driver and library bugs takes up a disgustingly large percentage of engineer time, I don't know a lot of people who would be looking forward to rely on more nvidia libraries.

by orliesaurus46 minutes ago|

parent|

[-]

fair point, but are there alternatives that aren't CUDA locked?

by whattheheckheck21 minutes ago|

parent|

prev|

[-]

Is there an issue board for these bugs? I want to see what is a disgustingly large percent. 50%?

by einpoklum1 hours ago|

prev|

[-]

Probably not, because the specifics of the workload - exact parameters, representation of data in memory, value ranges etc - lead you to highly divergent optimization strategies.

by orliesaurus44 minutes ago|

parent|

[-]

shouldn't it be possible to be run as a mlautoresearch project? i.e. orchestrate 10 strategies to speed it up, run in paralellel, pick the winning and go from there?