That said, I spent plenty of time doing both, and yet it would probably take me a while to arrive at this approach. For some reason, the "draw a sketch, have a model flesh it out" approach got bucketed with Stable Diffusion in my mind, and multimodal LLMs with "take detailed content, make targeted edits to it". So I'm glad the OP posted it.