undefined

points

by m-schuetz3 hours ago |

comments

by MindSpunk2 hours ago|

[-]

What exactly is the difference between these?

cuMemAlloc -> vmaAllocate + VMA_MEMORY_USAGE_GPU_ONLY

cuMemAllocHost -> vmaAllocate + VMA_MEMORY_USAGE_CPU_ONLY

It seems like the functionality is the same, just the memory usage is implicit in cuMemAlloc instead of being typed out? If it's that big of a deal write a wrapper function and be done with it?

Usage flags never come up in CUDA because everything is just a bag-of-bytes buffer. Vulkan needs to deal with render targets and textures too which historically had to be placed in special memory regions, and are still accessed through big blocks of fixed function hardware that are very much still relevant. And each of the ~6 different GPU vendors across 10+ years of generational iterations does this all differently and has different memory architectures and performance cliffs.

It's cumbersome, but can also be wrapped (i.e. VMA). Who cares if the "easy mode" comes in vulkan.h or vma.h, someone's got to implement it anyway. At least if it's in vma.h I can fix issues, unlike if we trusted all the vendors to do it right (they wont).

by nice_byte3 hours ago|

prev|

[-]

> I want the simple approach in addition what exists now, so we can both have our cakes.

The simple approach can be implemented on top of what Vulkan exposes currently.

In fact, it takes only a few lines to wrap that VMA snippet above and you never have to stare at those pesky structs again!

But Vulkan the API can't afford to be "like CUDA" because Vulkan is not a compute API for Nvidia GPUs. It has to balance a lot of things, that's the main reason it's so un-ergonomic (that's not to say there were no bad decisions made. Renderpasses were always a bad idea.)

by m-schuetz3 hours ago|

parent|

[-]

> In fact, it takes only a few lines to wrap that VMA snippet above and you never have to stare at those pesky structs again!

If it were just this issue, perhaps. But there are so many more unnecessary issues that I have no desire to deal with, so I just started software-rasterizing everything in Cuda instead. Which is way easier because Cuda always provides the simple API and makes complexity opt-in.