upvote
Model testing and swapping is one of the surprises people really appreciate DSPy for.

You're right: prompts are overfit to models. You can't just change the provider or target and know that you're giving it a fair shake. But if you have eval data and have been using a prompt optimizer with DSPy, you can try models with the one-line change followed by rerunning the prompt optimizer.

Dropbox just published a case study where they talk about this:

> At the same time, this experiment reinforced another benefit of the approach: iteration speed. Although gemma-3-12b was ultimately too weak for our highest-quality production judge paths, DSPy allowed us to reach that conclusion quickly and with measurable evidence. Instead of prolonged debate or manual trial and error, we could test the model directly against our evaluation framework and make a confident decision.

https://dropbox.tech/machine-learning/optimizing-dropbox-das...

reply
It's not just about fitting prompts to models, it's things like how web search works, how structured outputs are handled, various knobs like level of reasoning effort, etc. I don't think the DSPy approach is bad but it doesn't really solve those issues.
reply
funnily enough the model switching is mostly thanks to litellm which dspy wraps around.
reply