I found it works very well in once-off scenarios, but the specs often drift from the implementation. Even if you let the model update the spec at the end, the next few work items will make parts of it obsolete.
Maybe that's exactly the goal that "codespeak" is trying to solve, but I'm skeptical this will work well without more formal specifications in the mix.
Yes and yes. I think it's an important direction in software engineering. It's something that people were trying to do a couple decades ago but agentic implementation of the spec makes it much more practical.
I have the same basic workflow as you outlined, then I feed the docs into blackbird, which generates a structured plan with task and sub tasks. Then you can have it execute tasks in dependency order, with options to pause for review after each task or an automated review when all child task for a given parents are complete.
It’s definitely still got some rough edges but it has been working pretty well for me.