undefined

points

[-]

God, as someone who took their elective on graphics program when GPGPU and computer shaders first became a thing, reading this makes me realize I definitely need an update on what modern GPU uarchs are like now.

Re: heterogenous workload: I'm told by a friend in HPC that the old advice about avoiding diverging branches within warps is no longer much of an issue – is that true?

by zozbot2344 hours ago|

parent|

[-]

That advice applies within warps, to single 'threads' (effectively SIMD lanes) whereas the article is consistently about running heterogenous tasks on different warps.

by LegNeato6 hours ago|

prev|

[-]

Yes, that's the idea.

GPU-wide memory is not quite as scarce on datacenter cards or systems with unified memory. One could also have local executors with local futures that are `!Send` and place in a faster address space.

by pjmlp4 hours ago|

prev|

[-]

This is already happening in C++, NVidia is the one pushing the senders/receivers proposal, which is one of the possible co-routine runtimes to be added into C++ standard library.

by jmalicki5 hours ago|

prev|

[-]

A ton of GPU workloads require leaving large amounts of RAM resident on the GPU and running computation with some new data from the CPU.