I can't offhand think of anything that an LLM image generator would do to improve the process; it'd be an interesting research task. You'd need a way to transform the 256-bit hash into LLM input in a way that would maximize the perceptual difference in generated images. The problem is that it's absolutely critical that two different implementations work the same, which means the spec would need to specify the exact set of model weights to use.