upvote
Yeah, Vulkan is shedding most of the abstractions off. Buffers are no longer needed - just device addresses. Shaders don't need to be baked into a pipeline - you can use shader objects. Even images rarely provide any speedup advantages over buffers, since texel cache is no longer separate from memory cache.

GPUs these days have massive cache often hundreds of megabytes large, on top of an already absurd amount of registers. A random read will often load a full cacheline into a register and keep it there, reusing it as needed between invocations.

reply
These GPUs are still big SIMD devices at their core though, no?
reply
Yes, but no. No, in that these days, GPUs are entirely scalar from the point of view of invocations. Using vectors in shaders is pointless - it will be as fast as scalar variables (double instruction dispatch on AMD GPUs is an exception).

But yes from the point of view that a collection of invocations all progressing in lockstep get arithmetic done by vector units. GPUs have just gotten really good at hiding what happens with branching paths between invocations.

reply
SIMT is distinct model. Ergonomics are wildly different. Instead of contracting a long iteration by packing its steps together to make them "wider", you rotate the iteration across cores.

The critical difference is that SIMD and parallel programming are totally different in terms of ergonomics while SIMT is almost exactly the same as parallel programming. You have to design for SIMD and parallelism separately while SIMT and parallelism are essentially the same skill set.

The fan-in / fan-out and iteration rotation are the key skills for SIMT.

reply