upvote
Honestly I've not found a huge amount of value from the "science".

There are plenty of papers out there that look at LLM productivity and every one of them seems to have glaring methodology limitations and/or reports on models that are 12+ months out of date.

Have you seen any papers that really elevated your understanding of LLM productivity with real-world engineering teams?

reply
No, I agree! But I don’t think that observation gives us license to avoid the problem.

Further, I’m not sure this elevates my understanding: I’ve read many posts on this space which could be viewed as analogous to this one (this one is more tempered, of course). Each one has this same flaw: someone is telling me I need to make a “organization” out of agents and positive things will follow.

Without a serious evaluation, how am I supposed to validate the author’s ontology?

Do you disagree with my assessment? Do you view the claims in this content as solid and reproducible?

My own view is that these are “soft ideas” (GasTown, Ralph fall into a similar category) without the rigorous justification.

What this amounts to is “synthetic biology” with billion dollar probability distributions — where the incentives are setup so that companies are incentivized to convey that they have the “secret sauce” … for massive amounts of money.

To that end, it’s difficult to trust a word out of anyone’s mouth — even if my empirical experiences match (along some projection).

reply
The multi-agent "swarm" thing (that seems to be the term that's bubbling to the top at the moment) is so new and frothy that is difficult to determine how useful it actually is.

StrongDM's implementation is the most impressive I've seen myself, but it's also incredibly expensive. Is it worth the cost?

Cursor's FastRender experiment was also interesting but also expensive for what was achieved.

I think my favorite current example at the moment was Anthropic's $20,000 C compiler from the other day. But they're an AI vendor, demos from non-vendors carry more weight.

I've seen enough to be convinced that there's something there, but I'm also confident we aren't close to figuring out the optimal way of putting this stuff to work yet.

reply
The writing on this website is giving strong web3 vibes to me / doesn't smell right.

The only reason I'm not dismissing it out of hand is basically because you said this team was worth taking a look at.

I'm not looking for a huge amount of statistical ceremony, but some detail would go a long way here.

What exactly was achieved for what effort and how?

reply
Yeah, they've not produced as much detail as I'd hoped - but there's still enough good stuff in there that it's a valuable set of information.
reply