undefined

upvote

points

by Waterluvian7 hours ago |

upvote

by adjejmxbdjdn5 hours ago|

[-]

Could it be that the fact that the thing you’re an expert at looked like garbage to you, but the things you’re not an expert at, looked just fine, is not a coincidence?

You can talk to a bunch of designers who will say the opposite. Claude Design Studio generated this garbage UI, that I fixed manually, but it created great code j never could have that made it work.

reply

upvote

by vibebased5 minutes ago|

[-]

Yeah, that's basically me. (Hold the "expert", substitute with "has a degree, at least.")

findfantasyxviii.com

reply

upvote

by reactordev4 hours ago|

[-]

This is the juxtaposition the general public is in. They don’t have advanced tech skills to know any better so they see an output that they can’t produce from their skills and think it’s great. Maybe it is, maybe it isn’t. What does the code look like?

reply

upvote

by genxy4 hours ago|

[-]

Both had a working prototype. The flaw everyone is making is that they are over focusing on the artifact and not that they have a shared tangible object that they can both editorialize and iterate on.

These systems should allow rapid iteration on discovery and thinking. One can now make a prototype a day that would have taken a week. That means that we should be able to converge on a much better design in the same amount of time it would have taken to make a v0 that turns how to have systemic flaws.

AI should scale our understanding of systems, not just shovel out half baked features and apps.

reply

upvote

by bluejellybean2 hours ago|

[-]

Road to hell is paved with a lot of 'shoulds' reality is a very different place filled with piles of trash and half baked ideas.

reply

upvote

by lobf19 minutes ago|

[-]

This is where I’m at. I’ve always been a computer tinkerer but a novice coder at best. I work in the film industry, so I don’t need to know how to code.

Where I’m at when building personal applications for my home / life is: does the code execute and perform the desired task?

If so, what do I care how shitty it is? I’m not publishing these projects (for the most part… I have one joke application up at songshift.reachnick.co) so efficient, clean, secure code are not really a priority for me.

reply

upvote

by 1 hours ago|

[-]

deleted

reply

upvote

by rhubarbtree55 minutes ago|

[-]

Colleague (non-designer) generated UI with Claude. It was awful and broke basic design rules. So yes you may be right.

reply

upvote

by vslira3 hours ago|

[-]

Maybe specialists have a higher bar than consumers, and as a design consumer he's right about the design, and the designer is right about the code, if "being right" means "understanding what the end customer will think about this".

reply

upvote

by 24 minutes ago|

[-]

deleted

reply

upvote

by 12_throw_away42 minutes ago|

[-]

> Could it be that the fact that the thing you’re an expert at looked like garbage to you, but the things you’re not an expert at, looked just fine, is not a coincidence?

Well when you put it that way ... monetizing the Dunning-Kruger effect does actually sound like a very good business idea.

reply

upvote

by tomgp4 hours ago|

[-]

I think this is true, it's like a close relation of the Gell-Mann amnesia effect

https://en.wikipedia.org/wiki/Michael_Crichton#:~:text=%5B14...

reply

upvote

by chrismorgan3 hours ago|

[-]

More robust link (to the heading by ID, rather than by text directive with pre/post text that will change): https://en.wikipedia.org/wiki/Michael_Crichton#%22Gell-Mann_...

reply

upvote

by MichaelZuo3 hours ago|

[-]

This!

People are never perfectly even in intelligence across all possible disciplines.

reply

upvote

by ajross1 hours ago|

[-]

It's worth pointing out that Crichton coined that term during a period in his life where he was rapidly descending into conspiracy and iconoclastic thought, and this is of a piece with that.

Gell-Mann's observation was a sincere and thoughtful caution about the way we transmit information about complicated ideas. Crichton's "amnesia effect" is an excuse to ignore media you dislike.

reply

upvote

by dataflow4 hours ago|

[-]

I'm confused, this doesn't make sense. The target they're iterating on (UI) is the same one whose quality they're assessing, not a different one (source code).

You're suggesting that (a) their UI skills are lacking (based on what? isn't UI exactly what they were iterating on and trying to improve?), and (b) that a real UI expert would've somehow felt the UI they were working on was consistently garbage, despite how many times they iterate on it?

Which means you're saying you don't believe anyone can actually produce high quality (to an expert) output with AI on the same target they're working on, and if they think they are, that just means they don't have a good sense of quality?

reply

upvote

by danielmarkbruce3 hours ago|

[-]

It's not confusing. It makes sense.

reply

upvote

by 8note1 hours ago|

[-]

no, it is confusing.

the llm produced something the operator thought was garbage for the design too, and the operator iterated it from garbage to good.

they could also have the llm iterate the underlying code from garbage to good, if they wanted.

most likely a specialist would say its neither good nor bad, since its not considering the right things, and hasnt collected the right useability feedback, but making straightforward designs isnt that hard, and counting clicks and interactions, and avoiding hidden functionality is all measureable stuff

reply

upvote

by danielmarkbruce25 minutes ago|

[-]

is functional but has bad UI/layout/etc is a thing.

It's only confusing because you don't know the field. Which is kind of the point.

reply

upvote

by lobf16 minutes ago|

[-]

> is functional but has bad UI/layout/etc is a thing.

Tell me about it… I was forced to use a program called Farmer’s Wife for a time. What a fucking nightmare of a UX.

reply

upvote

by michaelchisari3 hours ago|

[-]

Without proper training, what looks good may be trash. I always thought pixel art generated by diffusion models looked damn good. Then I started watching and reading reviews by actual pixel artists, and all they saw was flaws. And it wasn't just nitpicking, it was things that were fundamentally wrong, difficult to fix and would look awful and amateurish and distracting to the player in production.

reply

upvote

by vunderba3 hours ago|

[-]

Much of this comes from the fact that, as is true for almost everything, an LLM (generative model etc) presents itself as an expert. It'll very confidently produce results that, to a layperson, look quite good. But the more of an expert you are in a field, the more apparent the cracks become.

AI pixel art looks particularly bad because most users don’t even go through the effort of downscaling and then upscaling it using something as simple as nearest-neighbor scaling, which by itself will squash out a lot of high-frequency noise that manifests in the form of terrible looking "fringing". Proper grid alignment also makes a big difference. It’s not perfect by a long shot, but it helps.

reply

upvote

by holoduke2 hours ago|

[-]

Ai is a hammer. Use it right and it makes you very powerful. But it's not an easy tool.

reply

upvote

by yawnxyz1 hours ago|

[-]

this is why people still enjoy eating at Olive Garden and Chipotle and Sweetgreen

basically the AI-slop version of food, yet still they thrive

reply

upvote

by jmount4 hours ago|

[-]

Good point on "Gell-Mann Amnesia Effect."

reply

upvote

by akersten6 hours ago|

[-]

> The code it generated was awful. The kind of garbage that people who don’t know any better would ship: it looked right and it worked. But it was instantly a maintenance dead end.

In the Tailwind thread the other day I was explicitly told that the intended experience of many frameworks is "write-only code" so maybe this is just the way of the future that we have to learn to embrace. Don't worry how it's all hooked up, if it works it works and if it stops working tell the AI to fix it.

It's kind of liberating I guess. I'm not sure if I've reached AI nirvana on accepting this yet, but I do think that moment is close.

reply

upvote

by hollowturtle4 hours ago|

[-]

I'm pretty you wouldn't want the same for code that runs healthcare, banks or transport. Only useless shitty web projects could embrace what's you're saying. And no there's no "Claude review the code and improve it" magical formula

reply

upvote

by QuercusMax2 hours ago|

[-]

I work in the health software space and there are tons of internal tools which aren't production code that can benefit massively from throwaway "write-only code". Putting a web UI on top of a management CLI tool so support ops can run things without needing an oncall engineer can be a huge win. I recently built a testing UI that doubles a demo-scenario-setup tool. Is it well-engineered? Who cares - it pokes the right things into the database and runs the right backend tasks, and has helped me catch and fix dozens of real bugs in the UIs that customers see.

There is an enormous untapped market for crappy low-effort apps which previously weren't worth the time - but with the effort so low put together a simple dashboard or one-off tool it becomes much more attractive.

reply

upvote

by simmerup6 hours ago|

[-]

The problem is it’s impossibly hard to test all the edge cases

Which is probably why so many random buttons in microsoft/apple/spotify just stop working once you get off the beaten path or load the app in some state which is slightly off base

reply

upvote

by marcosdumay6 hours ago|

[-]

The problem is worse than that.

The number of edge cases in a software is not fixed at all. One of the largest markers of competence in software development is being able to keep them at minimum, and LLMs tend to make that number higher than humanely possible.

reply

upvote

by disgruntledphd26 hours ago|

[-]

Yeah, the biggest thing I've noticed from LLMs is that large tech products now have even more bugs. Turns out the humans weren't so bad after all...

reply

upvote

by michaelcampbell6 hours ago|

[-]

> Turns out the humans weren't so bad after all...

The people pushing AI _over_ humans never thought they were. They just don't care about 'good' or 'bad', only 'time-to-market'. A bad app making money is better than a good one that isn't deployed yet. And who cares about anything past the end of the quarter? That's the next guy's problem.

reply

upvote

by 6 hours ago|

[-]

deleted

reply

upvote

by louiereederson6 hours ago|

[-]

I'm wondering if companies are 'diverting' engineering resources from core products to AI products with the view that the former are legacy. Kind of two sides of the same coin though.

reply

upvote

by SpicyLemonZest4 hours ago|

[-]

I'm sure there's a lot of AI investment, but I've definitely also seen fixed sets of core product engineers shipping a lot more bugs these days.

reply

upvote

by noosphr1 hours ago|

[-]

The if in there is doing a lot of heavy lifting.

reply

upvote

by giancarlostoro6 hours ago|

[-]

Easy, have Claude review the code, tell it to be critical and that it needs to be easier to understand, follow Clean Code, SOLID principles and best practices. Lie to it, say you got this from a Junior developer, or "review it as if you were a Staff Level Engineer reviewing Junior code" the models can write better code, just nobody tells them to.

reply

upvote

by marcosdumay6 hours ago|

[-]

Lol, the only thing worse than a junior developer following Clean Code and SOLID has to be an LLM messing with code so it looks like it follows.

reply

upvote

by giancarlostoro5 hours ago|

[-]

Clean Code has its really "meh" areas, but the core idea and spirit of it is sound, heck Python's best guide is PEP-8 if you follow that, it forces you to write much better Python code.

In terms of "junior dev following" it would be the model trying to think and write it as a Senior or Staff Level engineer would.

reply

upvote

by itsalwaysgood29 minutes ago|

[-]

I've had success with this approach also. You do feel empowered, 10x or whatever: but then you're looking at more projects, context switching a lot more, and it can burn you out.

reply

upvote

by HappySweeney6 hours ago|

[-]

Code review is the main thing I use LLMs for. I have found it to be remarkably candid when you tell it the code came from another LLM (even name it). I was running Kimi K2.6 Q4 locally, seeing if it could SIMD a bit-matrix transpose function, and it was slow enough that I would paste its thinking into Gemini every few minutes. Gemini was savage.

reply

upvote

by datsci_est_20155 hours ago|

[-]

> Gemini was savage.

Humorously, this could be the result of LLMs vacuuming up all the sentiment on the web that the code that LLMs produce is trash-tier.

reply

upvote

by kenjackson6 hours ago|

[-]

This is it. I've had a similar experience in just playing around I asked it to clean up some code it wrote to increase maintainability and readability by humans. After a few iterations it had generated quite solid code. It also broke the code a couple of times along the way. But it does get me thinking that these pipelines with agents doing specific tasks makes a lot of sense. One to design and architect, one to implement, one to clean, one to review, one to test (actually there's probably a bunch of different agents for testing -- testing perf/power, that it matches the requirements/spec, matches the design, is readable/maintainable, etc...).

reply

upvote

by giancarlostoro6 hours ago|

[-]

I built GuardRails after some frustrations with Beads which I love, and this whole exchange made me realize, because I have "gates" after tasks, I could add a "Review the code" type of gate, and probably get insanely better output, I already get reasonably good output because I spec out the requirements beforehand, that's the other thing, if you can tell the LLM HOW to build before it does, you will have better output.

reply

upvote

by ar_lan5 hours ago|

[-]

Why wouldn't Claude just impose this same loop in the code it writes - or better, write better code before it needs such review?

reply

upvote

by y1n04 hours ago|

[-]

Because language models don’t think before doing, they think by doing.

Maybe a more idealized training set could improve things, but at least for today’s SOTA, you have to get the shitty first draft out and then improve it.

Harnessing makes a difference, but it’s only shuffling around when and where the tokens get generated. It can trade being slower by doing a hidden first draft and only showing the output after doing a self review. But the models still need to generate it all explicitly.

reply

upvote

by AlecSchueler4 hours ago|

[-]

Why would it? It doesn't do anything with intention without being prompted. When you ask it to do something it's going to give you what seems like the most likely result, it isn't striving to give you the most correct result, those things just have some overlap.

reply

upvote

by giancarlostoro4 hours ago|

[-]

I assume it would involve wasting a lot more tokens reasoning about this. It is known that GPT uses less tokens than Claude, but Claude uses them to reason about problems more, which is part of its "secret sauce" and why so many swear by Claude Code.

reply

upvote

by enraged_camel6 hours ago|

[-]

Even better, if you have access to multiple models, tell it you got the code from another AI agent.

I did an experiment on this a few weekends ago and Codex for example was a lot more adversarial and thorough in its review when given Claude-authored code compared to when given the same code with "I wrote this, can you review it?"

reply

upvote

by giancarlostoro6 hours ago|

[-]

If it's within its context window, it will know you're lying, so either compact or start a new chat (don't do this on Claude, it dings your usage, always has).

reply

upvote

by therouwboat5 hours ago|

[-]

Is this a joke? Smartest people on the planet never thought about telling AI to just write better code?

reply

upvote

by ryandrake4 hours ago|

[-]

Kind of wild that you have to tell an LLM things like "do it right" and "make the code maintainable" and "don't make mistakes". Shouldn't that be the default? I wouldn't accept a calculator application that got math wrong unless you pressed a button labeled "actually solve the problem."

reply

upvote

by jmalicki3 hours ago|

[-]

> Kind of wild that you have to tell an LLM things like "do it right" and "make the code maintainable" and "don't make mistakes". Shouldn't that be the default?

It's not the default, because the training data is full of unmaintainable code done wrong with mistakes. People literally complain that LLMs write too many tests or add comments.

If instead of "do it right", you give it specific actionable advice of how to right code, it does surprisingly well. Newer frontier models also do a great job of mimicking the style and rigor of the surrounding codebase without prompting, if you're working in an established codebase, for better or worse.

reply

upvote

by tayo423 hours ago|

[-]

The default isn't necessarily what ever you consider maintianable or do it right, which are ambiguous terms anyway.

You never wrote quick exploratory code? One off scripts? How is the Ai suppsed to know unless you tell it.

If you tell another person to write some code, how are they suppsed to know? If you have your boss come to you and ask you to write some code to do some data analysis are you going to spend weeks writing units tests and perfect abstractions? Or do it quick and get the data and result?

reply

upvote

by giancarlostoro4 hours ago|

[-]

You forget that this all takes tokens from the model, so it has to be very stingy and whatever it comes up with "first" is what it goes with. I've seen people do the same as me, tell the model NOT TO GUESS but to do research first, which yields better output and saves time. Models today are better when they review the context directly, the focus shifted from it knowing everything in its training data to being able to dynamically learn new things and use that information in a meaningful way.

For example, I built up a programming language from scratch with Claude, it knows nuances about my languages syntax, and can write code in my language effectively. I did it mostly as a test. It definitely helped that my language is heavily mostly Python based.

reply

upvote

by redsocksfan454 hours ago|

[-]

[dead]

reply

upvote

by naravara3 hours ago|

[-]

I have been wondering recently that if the cost of just throwing everything out and building it from scratch again gets low enough, maybe maintainability becomes less of a priority? Can we just embrace the thing like those Zen carpenters who build wooden fire shrines do where they just accept that the thing will keep burning down and they make a discipline around getting really good at rebuilding it?

Granted, the load bearing thing here is whether we’re actually getting good at rebuilding up to any sort of standard of quality. Or if the tooling is even structurally capable of doing that rather than just introducing new baskets of problems with each build.

reply

upvote

by albedoa4 hours ago|

[-]

I'm looking at that Tailwind thread. Do you really think that your comment here is a fair assessment of what you were told there? Come on now.

https://news.ycombinator.com/item?id=48166334

reply

upvote

by jvanderbot6 hours ago|

[-]

I wonder how much of this is momentum.

At the moment, we understand the basic tech, could reasonably DIY, but choose not to knowing full well there's a mess of understandable code somewhere we could go clean up but dont want to. We accept fast iterations because we know roughly the shape of how it "should be" and can guide an automated framework towards that. This is especially true on our own projects or something we built originally! Stark/Iron man knew/moved, the suit assisted by adding momentum.

We're riding our "knowledge momentum".

If companies can hold out long enough, that knowledge completely fades, and the tool is all you have. At that point, they are locked in. Then it's not Iron man, it's an Iron lung (couldn't resist!)

reply

upvote

by lelanthran1 hours ago|

[-]

The blog post is spot on: AI is exactly like an Iron Man suit - if you never take it off your muscles will atrophy to nothing!

reply

upvote

by Waterluvian6 hours ago|

[-]

Yeah that’s my main concern. It feels so so easy to be lazy and do a bad job now. And then my skills weaken and what makes me valuable fades.

I love the Iron lung reference. Perfect.

reply

upvote

by theptip5 hours ago|

[-]

We collectively have to re-learn what operations are expensive and what are cheap.

Prototypes are practically free now. You can ask the AI try each architectural or stylistic option and just see which code you like better.

To your point, another interesting note is that rewriting and rearchitecting are also very good.

One pattern I like is to vibe code a set of solutions, pick the approach, then backfill tests and do major refactors to make it maintainable.

Here the skill is knowing what good architecture looks like, and knowing how to prompt and validate (eg what level of tests will speed up the feedback cycle or enable me to make the LLM’s changes legible).

To be fair the “ready, fire, aim” approach of rapid prototyping has been known for a long time, but you need to be quite quick at coding in old world for it to work well IMO.

reply

upvote

by iwiwk5 hours ago|

[-]

Free? Lol

reply

upvote

by eithed6 hours ago|

[-]

That's the model I've arrived to as well:

- first I've created a skill how the architecture of the system should look like

- I'll tell the LLM to follow the guidelines; it will not do that 100%, but it will be good enough

- I'll go through what it produced, align to the template; if I like something (either I've not thought about the problem in that way, or simply forgot) I add that to the skill template

- rinse and repeat

This is not only for architecture of the system, but also when (and how to) write backend, frontend, e2e tests, docs. I know what I want to achieve = I know how the code should be organized and how it should work, I know how tests should be written. LLMs allow me to eliminate the tediousness of following the same template every time. Without these guardrails it switches patterns so often, creating unmaintainable crap

Bear in mind - the output requires constant supervision = LLM will touch something I told it not to touch, or not follow what I told it to do. The amount of the output can also sometimes be overwhelming (so, peer review is still needed), but at this point I can iterate over what LLM produces with it, with another LLM, then give to a human if it together makes sense

reply

upvote

by onion2k35 minutes ago|

[-]

But it was instantly a maintenance dead end.

It doesn't really make sense to suggest AI can work on something any make it now and work correctly, and at the same time say it's unmaintainable. It is maintainable with AI.

The real question is whether or not you're happy to ship AI-generated code that you can't modify to production unless you use AI. Few developers are there yet, plenty of non-tech people are there already. I don't know which group is actually wrong.

reply

upvote

by dylan6046 hours ago|

[-]

> But I had an effortless time converging on a design that I wouldn’t have been able to do on my own (I’m not a designer).

I'm not a designer either, but I've been around designers long enough to recognize when something is bad but just not know what is needed to make it better/good. I've taken time to find sites that are designed well and then recreated them by hand coding the html/css to the point that I consider myself pretty decent at css now. I don't need libraries or frameworks. My css/html is so much lighter than what's found in those frameworks as well. I still would not call myself a designer, but pages look like they were designed by a mediocre designer rather than an engineer :shrug:

reply

upvote

by onlyrealcuzzo1 hours ago|

[-]

I'm trying to test if vibe coding can actually scale... And man is it painful.

AI is great at creating slop that almost works.

But, my god, it is terrible at following clear as day instructions on how to cleanup slop.

It wrote 150k lines of code that almost works in 2 months. It's taken 1 month to delete about 2000 lines of broken architecture and fix it, and it still hasn't gotten it done, despite nonstop repeated efforts to do something not that hard.

I definitely could've fixed it less time then I've spent prompting at this point (but no way I'd have gotten the other 150k lines). But doing it myself is not the point. It's to see if it can actually scale.

The answer is yes... But my god is it agonizing.

The creating garbage part that almost works is fun.

The inevitable cleanup is not.

And unfortunately I don't see this aspect materially improving in the short term.

If you want it to code you something about 5-10k lines of code that's already been done 1000 times before or only slightly different, it's great.

Most people want more than that.

reply

upvote

by the__alchemist6 hours ago|

[-]

Tangent: I never learned how to make the sorts of websites people find "professional" or "pretty" I could make functional and easy-to-use webapps, but not something people would think looks good or like something they would want to use. LLMs crushed this, without performance overhead; can still be HTML/CSS/targetted JS.

reply

upvote

by snarf216 hours ago|

[-]

I feel the same but the question I struggle most with is this: "Does it matter when the people who are going to come along and maintain this are just going to use AI to fix or adjust this maintenance nightmare?"

reply

upvote

by Waterluvian6 hours ago|

[-]

At that point the code becomes a compile target, and then you need a new source of truth.

Which I think is perfectly worthy of exploration. Some people want to check in the prompts. Or even better, check in a plan.md or evenest betterest: some set of very well-defined specifications.

I'm not sure what the answer will be. Probably some mix of things. But today it is absolutely imperative that the code I write for the case I wrote it in is good quality and can be maintained by more than just me.

reply

upvote

by ff3176 hours ago|

[-]

When we want to maintain a reliable, stable "product" in traditional software development (a binary executable artifact that ships out to users, or the binary engine of some SaaS the company sells to users), we don't just check in (to the source of truth repo) the actual application-layer source code. We also check in build instructions (think autoconf/cmake/etc) and have some concept of compiler compatibilities / versions, build environments, and papering over their runtime differences. And then our official executable output is not just defined by "Tag v1.23.45 of the application source code repo" - it's additionally defined by the build environment (including, critically, the compiler version, among many others).

It's tempting to move out a layer and try making prompts and plan.md the "source code", and then the generated actual-source-code becomes just another ephemeral form of "intermediate representation" in the toolchain while building the final executable product. But then how are you versioning the toolchain and maintaining any reasonable sense of "stability" (in terms of features/bugs/etc) in the final output?

Example: last week, someone ran our "LLM inputs" source code through AgentCo SuperModel-7-39b, and produced a product output that users loved and it seemed to work well. Next week, management asks for a new feature. The "developer" adds the new feature to the prompting with a few trial iterations, but the resulting new product now has 339 new subtle bugs in areas that were working fine in last week's build owing the fact that, in the meantime, AgentCo has tweaked some weights in SuperModel-7-39b under the hood because of some concern about CSAM results or whatever and this had subtle unrelated effects. Or better yet: next month, management has learned that OtherCo MegaModel-42.7c seems to be the new hotness and tells everyone to switch models. Re-building from our "source" with the new model fixes 72 known bugs filed by users, fixes another 337 bugs nobody had even noticed yet, and causes 111 new bugs to be created that are yet-unknown.

If you treat the output source code as a write-only messy artifact, and you don't have stable, repeatable models, and don't treat model updates/changes as carefully as switching compiler vendors and build environments, this kind of methodology can only lead to chaos.

And don't even get me started on the parallel excuses of "Your specifications should be more-perfect" (perfection is impossible), or "An expansive testsuite should catch and correct all new bugs" (also impossible. testing is only as good as the imperfect specification, and then layers in its own finite capabilities to boot).

reply

upvote

by user342836 hours ago|

[-]

I don't see the benefit of checking in either prompts or specs.

I never tried spec driven development for myself, but if I review other's MRs I am typically exhausted after the first 10 lines.

And there are hundreds of lines, nearly always with major inaccuracies.

For myself I always found the plan mode to work well. Once the implementation is done, the code is the source of truth. If it works, it works.

When I want to add more functionality or change it, I just tell the agent what I want changed.

I doubt walls of semi-accurate existing specs are going to be beneficial there, but maybe my work differs from yours.

reply

upvote

by gbear05 hours ago|

[-]

Those checked-in specs become the requirements for the system. So the next time you ask the AI to make a fix, it can use those specs as part of the solution and not break another requirement. Basically the code underneath keeps getting rewritten over and over, but that doesn't matter as long as it hits the required specs.

reply

upvote

by jmcodes4 hours ago|

[-]

Do you rewrite the specs with new requirement changes if they've already been implemented? How do you supercede a spec?

I've been using LLMs daily and I spun up a few spec driven flows once or twice but like the person above I think the code is the source of truth.

Also why wouldn't you use TDD to enforce the 'spec' then?

reply

upvote

by macintux6 hours ago|

[-]

I value traceability, and I value understanding the "why" of the code. For me, the prompts are useful for both.

reply

upvote

by mehagar6 hours ago|

[-]

Same. Messy code makes it harder for us to understand and thus maintain the code (which is why people often refer to code as a liability), but is that the case for AI tools as well? If not, it seems like clean code may not matter as much anymore.

reply

upvote

by n_e6 hours ago|

[-]

The problem with crappy frontend code is not only the maintenance. It's that stuff such as responsive design, accessibility or cross-browser compatibility that work nearly for free with elegant code won't work at all.

reply

upvote

by flyinglizard6 hours ago|

[-]

The problem is that technical debt is compounding. Bad LLM architectural and implementation decisions just blend in to the background and you build layer upon layer of a mess. At some point it becomes difficult and expensive (token wise) to maintain this code, even for an agent.

I mitigate this by few things: 1. Checkpoints every few days to thoroughly review and flag issues. Asking the LLM to impersonate (Linus Torvalds is my favorite) yields different results. 2. Frequent refactors. LLMs don't get discouraged from throwing things out like humans do. So I ask for a refactor when enough stuff accumulates. 3. Use verbose, typed languages. C# on the backend, TypeScript on the frontend.

Does it produce quality code? Locally yes, architecturally I don't know - it works so far, I guess. Anyway, my alternative is not to make this software I'm writing better but not making it at all for the lack of time, so even if it's subpar it still brings business value.

reply

upvote

by worldsayshi6 hours ago|

[-]

> The code it generated was awful.

I suppose you could solve that in two ways. Manually rewrite it as you did. Or formalize an architecture and let the AI rewrite it with that in mind. I suspect that either works.

reply

upvote

by bentcorner3 hours ago|

[-]

I suspect at some point AI-written code will be eventually artifacts generated build-to-build. The design docs and UI tests are the source and the model follows instructions to generate the product. If you make the models deterministic then model improvements give you code improvements across your entire codebase "for free".

reply

upvote

by neals4 hours ago|

[-]

Very recognizable and hard to reason about! I did something similar, but while looking at the code, it looked so procedural, hardly abstract, no vision. How was I ever going to maintain this? I guess Ai will do it forever?

reply

upvote

by EasyMark5 hours ago|

[-]

Did you try to have it clean up the code and refactor? I find while the code is usually low to mid tier that it’s a lot better than the first pass. I of course back up the working version lol. Usually I can coax something better out of it

reply

upvote

by noosphr1 hours ago|

[-]

This isn't an indictment of how good AI is but how poor our tools are. We had gui makers in 1996 that made slop which allowed you to iterate in real time. They didn't need a datacenter worth of compute and a nuclear reactors worth of power to run.

reply

upvote

by HikeThe466 hours ago|

[-]

If you are just blindly vibe coding without any parameters, guardrails, architecture, or broad guidance; you're going to have a mess of slop.

the power comes from creating the machine you can steer. Treat AI like an over eager college intern who you need to hand hold, but do tasks.

reply

upvote

by nomel3 hours ago|

[-]

> But it was instantly a maintenance dead end.

I gave up on this recently. It achieved the goal now, and in a year or two, when you actually want to add whatever feature, the SOTA AI will probably be able to clean it up as it does so. What does "maintain" even mean anymore?

If you don't agree, how many years into the future do we need until you would agree?

reply

upvote

by dbingham3 hours ago|

[-]

The problem is that people keep saying this, but the code keeps being bad. Every time I commit myself to trying to build something with AI, I end up wasting a ton of time and backing it out or completely rewriting it without the AI. The code it generates just isn't where it needs to be.

And people have been saying this exact thing for years now. Someone said this very thing two years ago. And we're still at the "maintenance dead end" stage. So let me flip it back on you: how many years are we going to pour an obscene amount of resources into this thing that is always going to be able to clean up its own messes "in a year or two" before we realize its a dead end (at best) and we need to be using those resources elsewhere? And, similarly, what happens to you when the SOTA AI in two years can't clean up the code it wrote for you two years ago, but people are depending on it and your still on the hook for maintaining it?

reply

upvote

by nomel2 hours ago|

[-]

> If you don't agree, how many years into the future do we need until you would agree?

Respectfully, I asked first. ;)

> before we realize its a dead end (at best)

You've declared the future, which doesn't leave much room for a conversation. So, cheers!

reply

upvote

by znpy2 hours ago|

[-]

> The kind of garbage that people who don’t know any better would ship: it looked right and it worked.

I feel what you write, but then again: every now and then i write small greasmonkey scripts to remove annoyances from webpages, and to do so i have to look at the html and the kind of trash you describe is already there.

reply

upvote

by threethirtytwo4 hours ago|

[-]

Well here's the million dollar question. It's a maintenance dead end for humans to read and edit. But for an LLM, is it a maintenance deadend? Could the LLM iterate on that same code base and be highly effective on it?

reply

upvote

by hollowturtle4 hours ago|

[-]

Yes it is a maintenance deadend for LLMs it's notorious codebases start accumulating enormous amount of tech debt and that it gets almost impossible to unravel it even with the best agent

reply

upvote

by threethirtytwo3 hours ago|

[-]

I've never seen it happen. You say this and it's likely one anecdotal claim. At the same time there's counter points like Bun getting rewritten in rust.

This is talk and talk is cheap. Prove it, otherwise it's still a million dollar question... unanswered.

HN is notoriously mentally deficient when it comes to AI. They were wrong about self driving cars (I sit in AI cars daily), they were wrong about AI getting used for coding (I don't use an IDE or type code anymore as a SWE). So I have to say unless there's something evidence based or substantial here it's likely given HN track record that most people here will end up being another wrong, baseless and over confident answer.

I'm looking for legit answers not confidently biased statements with no evidence.

reply

upvote

by rustyminnow1 hours ago|

[-]

Bun getting rewritten in Rust is not really the counter point you think it is. The rust version hasn't shipped yet, so there hasn't even been a chance to see if the code can be maintained. It's an impressive feat no doubt, but until they've maintained it on a months to years timeline, it's also just talk with no evidence.

reply

upvote

by ios-contractor3 hours ago|

[-]

I used AI to write code (features) then I used AI to refactor the architecture using best practices and get rid of the technical debt. I don't remember the last time I modified code by hand honestly.

reply

upvote

by danielmarkbruce3 hours ago|

[-]

This is the part people are missing. I spend 30-40% of my time "vibe coding" doing "vibe clean up". It's fine, I'm still 10x more productive than I ever was.

reply

upvote

by dawnerd5 hours ago|

[-]

[dead]

reply

upvote

by wiseowise6 hours ago|

[-]

> I had an Iron Man moment

Iron Man created Jarvis whose capabilities are way beyond any models in the near future. So it wasn’t an Iron Man moment.

reply

upvote

by etiam5 hours ago|

[-]

He was presumably also not constructing a powered exoskeleton of from fictional materials or a physically implausible power source, but since you obviously caught the reference, how about some benevolent interpretation instead, for a decent shorthand about working smoothly with AI assistance.

(And on a personal note, I'm glad we don't have a publicly released Jarvis before we get our act together about the use.)

reply

upvote

by Supermancho2 hours ago|

[-]

>> I had an Iron Man moment

> Iron Man created Jarvis whose capabilities are way beyond any models in the near future. So it wasn’t an Iron Man moment.

Like an LLM, you misunderstood the context. The voyeuristic experience doesn't require fiction to be reality.

reply

upvote

by gbear06 hours ago|

[-]

Why was it a maintenance dead end? It sounds like you were able to iteratively work on it in its current state, but are you going to be the one maintaining the code?

I keep asking myself the same questions, and the conclusion I keep coming to is the clean modeled structure we want to see is for humans to maintain and extend, but the AI doesn't need this.

There's definitely an efficiency angle here where it's faster for AI to go from a clean modeled solution to the desired solution because it's likely been trained on cleaner code. Is this really going to matter though?

The best argument I can come up with is the clean modeled solution is better for existing development tools because it's less likely to get confused by the patch work of vibes throughout the code; but this feels like it ultimately becomes an efficiency concern as well.

This just might be the new reality, and we need to stop looking behind the curtain and accept what the wizard presents us.

reply

upvote

by ttd6 hours ago|

[-]

> the clean modeled structure we want to see is for humans to maintain and extend, but the AI doesn't need this.

This does not match my experience. I do a lot of AI-assisted coding at this point, and what I've seen is that when the AI is asked to extend or modify existing code, it does a much better job on clean, well-structured and well-abstracted code.

I think the reason is simple, and tracks for humans as well: well-structured code is simply easier to understand and reason about, and takes a smaller amount of working-set memory. Even as LLMs get better with coding, I expect that they would converge on the same conclusion, namely that good structure + good abstractions make for code that is more efficient to work with.

reply

upvote

by empath756 hours ago|

[-]

Yeah I have had claude take over multiple internal (human written) projects that were in a dire state and spent a week just completely refactoring them and adding exhaustive tests before doing any new features. It's worth starting from a clean slate.

reply

upvote

by K0balt5 hours ago|

[-]

I keep hearing the assertion that you can’t make high quality, maintainable code with LLMs. The last two years using AI have shown me exactly the opposite.

I think it’s all about the structure you use to work in and how you use the model. We are shipping better, more human friendly code, with less bugs, then we ever did before and doing it at 1/10 the cost before LLMs.

But we are definitely not vibe coding, and the key seems to be devs with years of experience managing teams, managing the LLM instead. Basically you create the same kind of formal specifications, conventions, and documentation that you would develop for a project with two or three teams, then use that to keep the project on the rails recursively looping back through the docs as you go along. I’ve only had to back out of a couple of issues over the last year, and even though that cost a couple of hours, it was still extremely cheap.

Meanwhile we are shipping at 4x speed with 1/4 the labor, and the code is better than it was because the “overhead” of writing maintainable, self documented code has inverted into the secret ingredient to shipping bug free code at unprecedented speed.

If you just explain the standards to which you want the code written, use a strict style guide, have a separate process that ensures test coverage (not in the same context) you can get example quality code all the way through. Turns out that’s also in the training data.

reply

upvote

by njovin6 hours ago|

[-]

Many of us recognize that the days of nearly-free tokens is quickly drawing to a close, and at some point humans may very well have to dig their keyboards out of cold storage and return once again to the code mines.

reply

upvote

by iwiwk5 hours ago|

[-]

OAI and Anthropic need to generate cash flows from operations - once they go public that’s it. Any future funding for reinvestment has to come from internal funds beyond existing raising + IPO.

So yeah, it’s imminent. Let’s see how demand shifts in response in the future.

reply

upvote

by dawnerd4 hours ago|

[-]

June 1st for all the folks taking advantage of copilot. It was an astounding deal and a lot of people were “abusing” it.

reply

upvote

by ahnick5 hours ago|

[-]

The reason why you will never get software engineers (in companies) to accept the man behind the curtain is liability. If a human software engineer is still responsible for what happens when the AI developed code has a catastrophic bug or security vulnerability, then the only way for the human to know if there is a problem is to be able to read through the code or run it through some <insert advanced formal verification tool here> that guarantees zero issues.

I think we eventually end up at the tool approach via vendors providing the tools to other companies, but it still feels like there's a long road ahead to get there.

reply

upvote

by lambda6 hours ago|

[-]

> but the AI doesn't need this

That's not true. The LLM performance will degrade as the codebase gets messier as well. You get to a point where every fix breaks something else and you can't really make forward progress.

Yes, you might be able to get a bit further with a messy codebase just because the LLM won't complain and will just grind through fixing things, but eventually it will just start disabling failing tests instead of actually fixing things.

reply

upvote

by dawnerd4 hours ago|

[-]

The token cost to fix might surpass what a human would cost to just do it.

reply

upvote

by glhaynes6 hours ago|

[-]

Sometimes I think the main value in AI-maintained code being “high quality” is when the structure can enforce invariants. If invalid states aren’t representable, then the AI can’t easily add bugs in the future.

Of course that just leads to: what’s the best way to achieve that goal? Through elegant code or adding lots of tests? Which is a debate from long before LLMs existed.

reply

upvote

by otabdeveloper46 hours ago|

[-]

> Why was it a maintenance dead end?

LLMs have a limit to how deep they can understand and refactor architectural issues.

That limit is far, far lower than a human's.

reply

upvote

by jplusequalt6 hours ago|

[-]

>This just might be the new reality, and we need to stop looking behind the curtain and accept what the wizard presents us.

This is how societies become shittier. People who are ostensibly responsible for doing their jobs not giving a damn about quality.

reply