On the prompt formulation; prompts with very similar formulations (in terms of both semantics, hamming distance, or both) can lead to _wildly divergent_ outputs in my experience. It's not rigourous, and when that divergence happens, it's extremely difficult (arguably impossible, by nature of the architecture of transformers) to identify why the divergence happened and where.