Basically, to find the answer you really need your own benchmark you run with real examples from what you want to do. Basically the same goes for anything ML nowadays as the public benchmarks cannot really be trusted to give you any sort of indication on how we'll it'd work for you.
Which can run comfortably on 12gb of vram. I gave it a whirl and it does seem pretty competitive. I wonder how that compares for your usecase