upvote
Yeah, v1 is sloppy, then I tell the LLM to clean it up. Every 1 prompt of building tends to require 1-5 prompts of clean up. Simple, fast, clean good code.

The chasm between "Software Developer" and "Software Engineer" is getting wider. Articles like this and the comments under it give away who is an Engineer and who is just a coder.

reply
> Every 1 prompt of building tends to require 1-5 prompts of clean up. Simple, fast, clean good code.

I have found this to be very effective as well. However, it's so easy to do, I can't imagine they won't build it in.

The harnesses will improve and the loop of "self-review, judge what needs clean up, do the refactoring, repeat until clean" will get included in the one-shot. They are already doing this somewhat, they'll just get a lot better at it and as the models get faster and cheaper to run, the refactoring churn at the end of each task won't even create a noticeable delay.

I do not think the high-level "taste" knowledge that I've built up -- when to break something off into its own service, what to put in the DB vs cache vs queues vs blob storage, how to isolate important logic in pure functional layers so it can be tested and validated independently -- is any more "unlearnable" to AI than the stuff I previously considered impressive that's now one-shottable like "write a Prolog implementation from scratch".

reply
They have definitely built some of it in.

And yes, right now you still need the architectural and system design knowledge because the LLM will fuck that up. We'll all find out if that continues being needed in the future. From what I understand about LLMs and how they work, I doubt it, but also, yeah, I doubted it would've gotten this far when I think back 2+ years ago.

Also, maybe I should be clear, I pretty much never one-shot things. My sessions with claude or other cli tools always starts with a bit of a conversation until we converge on a good plan, claude builds the code, we discuss some more, then we iterate.

reply
The chasm is 1-5 prompts wide?
reply
Knowing what you're doing holistically is the chasm.
reply
The distinction between developer and engineer is obviously that one adds "make no mistakes!!!1."
reply
I wish I had current-day AI (and a big credit card) for my previous job, they had a big legacy mess made by a productive but not very good developer, but my job was to rebuild it.

If I had AI tooling at the time I'd probably be more inclined to have it both refactor / optimize the existing application, add automated regression tests etc, and use it to extract all of the features and requirements for it for a potential rebuild.

But honestly I think if that application was properly designed and factored (instead of nesting JS in HTML in strings in JS or concatenating XML from query results only for it to be converted to JSON taking up 50% of response time) its lifetime could've been extended, especially if it was then containerized into a HHVM or similar php optimizer.

But, hindsight.

reply
Any tips on how you unsloppify things? Are you using things like claude.md/copilot.md (or similar) to guide better, do you have specific types of prompts that you run, or do you adjust your code review practices in some way to more efficiently review lots of slop code?

One of my particular complaints is how code-gen LLMs tend to re-create the same code over and over again. Case in point, a use-case where a team name is generated from a list of team member names. The LLM re-generates this code in-line every time it needs to display the team name, rather than simply writing and reusing a utility style function.

I know I need to fix this. At this point I'm planning to just prompt something like "please list all the places where team names are generated/calculated", plus manually search through the codebase, then perform the abstraction myself. But I'm unsure how to prevent this (both this example, and other cases that could benefit from similar utility functions) continuing to occur in the future.

reply
I accept that for every prompt of building I'm going to have 1-5 prompts of refinement.

Once the LLM tells me "Okay, it's done, everything works" I always as it to do a thorough review, I tell it to split up the work among sub-agents with each one taking on a specific responsibility (look for code smells, look for bad architecture, review the data access model, DUPLICATE CODE, testability and unit testing, etc.)

After a certain number of revisions and reviews you'll come to accept the shortcomings it comes back. Usually there will be specific design decisions you made that the LLM keeps bringing up, once the review only brings that up and maybe some other minor issues it's time to move on.

I don't overly rely on markdown files and directions. I don't rely on tooling around it either. I just don't trust the LLM when it says "all done", tests pass, and deployment works. I make it to multiple reviews and iterations even when it thinks it's done.

reply
> Any tips on how you unsloppify things?

Understand what you're writing. If you never build up the mental model of what the code is doing you'll never be able to discern what is slop and what isn't. There are no shortcuts.

Piling more prompts on might get you to the same end result, but without understanding you'll never know when you're there.

reply
Absolutely. I really don't think the future will be humans reading and picking apart an AI-generated codebase, there will be tech debt agents or whatever running overnight.
reply
I think you misunderstand why tech debt lingers around. It's not a capacity or capability problem.

Organisations just don't want to deal with the accountability involved with "touching cold code". Whether it's a human or "AI agent" doesn't change the "It worked in prod, you touched it, you broke it, never touch anything again" dynamic.

reply
Exactly this. When I reject a refactor PR (or ideally, _before_ there's a PR), it's not because it's a bad idea, per se.

But there's risk associated with every change, and it takes time to review, QA, monitor the rollout, communicate to stake holders, etc.

The refactor itself may be the smallest part of it.

reply
That's one dimension of it, but in the context of this thread we are talking about how maintainable a codebase is for other humans. If your codebase is messy you depend on a few key employees and it might be hard to onboard new ones, so there has always been financial incentives to reduce tech debt.
reply
> so there has always been financial incentives to reduce tech debt.

Yes. In practice, this does not weigh against organisational resistance.

AI really makes it worse by adding an explicit numerical cost to doing anything.

reply
So your proposal to handle tech debt created by "AI" being unable to do good engineering is... throw more AI at it? There's a saying about the definition of insanity which comes to mind.
reply
or linters
reply