undefined

upvote

points

by aeldidi3 hours ago |

upvote

by RealityVoid3 hours ago|

[-]

You're not going crazy. That is what I see as well. But, I do think there is value in:

- driving the LLM instead of doing it yourself. - sometimes I just can't get the activation energy and the LLM is always ready to go so it gives me a kickstart

- doing things you normally don't know. I learned a lot of command like tools and trucks by seeing what Claude does. Doing short scripts for stuff is super useful. Of course, the catch here is if you don't know stuff you can't drive it very well. So you need to use the things in isolation.

- exploring alternative solutions. Stuff that by definition you don't know. Of course, some will not work, but it widens your horizon

- exploring unfamiliar codebases. It can ingest huge amounts of data so exploration will be faster. (But less comprehensive than if you do it yourself fully)

- maintaining change consistency. This I think it's just better than humans. If you have stuff you need to change at 2 or 3 places, you will probably forget. LLM's are better at keeping consistency at details (but not at big picture stuff, interestingly.)

reply

upvote

by TheAceOfHearts2 hours ago|

[-]

For me the biggest benefit from using LLMs is that I feel way more motivated to try new tools because I don't have to worry about the initial setup.

I'd previously encountered tools that seemed interesting, but as soon as I tried getting it to run I found myself going down an infinite debugging hole. With an LLM I can usually explain my system's constraints and the best models will give me a working setup from which I can begin iterating. The funny part is that most of these tools are usually AI related in some way, but getting a functional environment often felt impossible unless you had really modern hardware.

reply

upvote

by christophilus1 hours ago|

[-]

Same. This weekend, I built a Flutter app and a Wails app just to compare the two. Would have never done either on my own due to the up front boilerplate— and not knowing (nor really wishing to know) Dart.

reply

upvote

by Fogest1 hours ago|

[-]

> - maintaining change consistency. This I think it's just better than humans. If you have stuff you need to change at 2 or 3 places, you will probably forget. LLM's are better at keeping consistency at details (but not at big picture stuff, interestingly.)

I use Claude Code a decent amount, and I actually find that sometimes this can be the opposite for me. Sometimes it is actually missing other areas that the change will impact and causing things to break. Sometimes when I go to test it I need to correct it and point out it missed something or I notice when in the planning phase that it is missing something.

However I do find if you use a more powerful opus model when planning, it does consider things fully a lot better than it used to. This is actually one area I have been seeing some very good improvements as the models and tooling improves.

In fact, I actually hope that these AI tools keep getting better at the point you mention, as humans also have a "context limit". There are only so many small details I can remember about the codebase so it is good if AI can "remember" or check these things.

I guess a lot of the AI can also depend on your codebase itself, how you prompt it, and what kind of agents file you have. If you have a robust set of tests for your application you can very easily have AI tools check their work to ensure things aren't being broken and quickly fix it before even completing the task. If you don't have any testing more could be missed. So I guess it's just like a human in some sense. If you have a crappy codebase for the AI to work with, the AI may also sometimes create sloppy work.

reply

upvote

by kace913 hours ago|

[-]

>driving the LLM instead of doing it yourself. - sometimes I just can't get the activation energy and the LLM is always ready to go so it gives me a kickstart

There is a counter issue though, realizing mid session that the model won’t be able to deliver that last 10%, and now you have to either grok a dump of half finished code or start from scratch.

reply

upvote

by brookst15 minutes ago|

[-]

I can’t speak for anyone else, but Claude Code has been transformative for me.

I can’t say it’s led to shipping “high quality projects”, but it has let me accomplish things I just wouldn’t have had time for previously.

I’ve been wanting to develop a plastic -> silicone -> plaster -> clay mold making process for years, but it’s complex and mold making is both art and science. It would have been hundreds of hours before, with maybe 12 hours of Claude code I’m almost there (some nagging issues… maybe another hour).

And I had written some home automation stuff back with Python 2.x a decade ago; it was never worth the time to refamiliarize myself with in order to update, which led to periodic annoyances. 20 minutes, and it’s updated to all the latest Python 3.x and modern modules.

For me at least, the difference between weeks and days, days and hours, and hours and minutes has allowed me to do things I just couldn’t justify investing time in before. Which makes me happy!

So maybe some folks are “pretending”, or maybe the benefits just aren’t where you’re expecting to see them?

reply

upvote

by FeteCommuniste3 hours ago|

[-]

There's got to be some quantity of astroturfing going on, given the players and the dollar amounts at stake.

reply

upvote

by input_sh2 hours ago|

[-]

Some? I'd be shocked if it's less than 70% of everything AI-related in here.

For example a lot of pro-OpenAI astroturfing really wanted you to know that 5.3 scored better than opus on terminal-bench 2.0 this week, and a lot of Anthropic astroturfing likes to claim that all your issues with it will simply go away as soon as you switch to a $200/month plan (like you can't try Opus in the cheaper one and realise it's definitely not 10x better).

reply

upvote

by Tossrock15 minutes ago|

[-]

You can try opus in the cheaper one if you enable extra usage, though.

reply

upvote

by mikenew3 hours ago|

[-]

Pretty much every software engineer I've talked to sees it more or less like you do, with some amount of variance on exactly where you draw the line of "this is where the value prop of an LLM falls off". I think we're just awash in corporate propaganda and the output of social networks, and "it's good for certain things, mixed for others" is just not very memetic.

reply

upvote

by whaleidk2 hours ago|

[-]

I wish this was true. My experience is co-workers who do lip service as to treating LLM like a baby junior dev, only to near-vibe every feature and entire projects, without spending so much as 10 mins to think on their own first.

reply

upvote

by yusufnb2 hours ago|

[-]

At my work I interview a lot of fresh grads and interns. I have been doing that consistently for last 4 years. During the interviews I always ask the candidates to show and tell, share their screen and talk about their projects and work at school and other internships.

Since last few months, I have seen a notable difference in the quality and extent of projects these students have been able to accomplish. Every project and website they show looks polished, most of those could be a full startup MVP pre AI days.

The bar has clearly been raised way high, very fast with AI.

reply

upvote

by smoe12 minutes ago|

[-]

In my opinion, it has always been the “easy” part of development to make a thing work once. The hard thing is to make a thousand things work together over time with constantly changing requirements, budgets, teams, and org structures.

For the former, greenfield projects, LLMs are easily a 10x productivity improvement. For the latter, it gets a lot more nuanced. Still amazingly useful in my opinion, just not the hands off experience that building from scratch can be now.

reply

upvote

by josiahpeters1 hours ago|

[-]

I’ve had the same experience with the recent batch of candidates for a Junior Software Engineer position we just filled. Their projects looked impressive on the surface and seemed very promising.

Once we got them into a technical screening, most fell apart writing code. Our problem was simple: using your preferred programming language, model a shopping cart object that has the ability to add and remove items from the cart and track the cart total.

We were shocked by how incapable most candidates were in writing simple code without their IDEs tab completion capability. We even told them to use whatever resources they normally used.

The whole experience left us a little surprised.

reply

upvote

by piskov9 minutes ago|

[-]

The pattern matching and absence or real thinking is still strong.

Tried to move some excel generation logic from epplus to closedxml library.

ClosedXml has basically the same API so the conversion was successful. Not a one-shot but relatively easy with a few manual edits.

But closedxml has no batch operations (like apply style to the entire column): the api is there but internal implementation is on cell after cell basis. So if you have 10k rows and 50 columns every style update is a slow operation.

Naturally, told all about this to codex 5.3 max thinking level. The fucker still succumbed to range updates here and there.

Told it explicitly to make a style cache and reuse styles on cells on same y axis.

5-6 attempts — fucker still tried ranges here and there. Because that is what is usually done.

Not here yet. Maybe in a year. Maybe never.

reply

upvote

by kylecazar2 hours ago|

[-]

Matches my experience pretty well. FWIW, this is the opinion that I hear most frequently in real life conversation. I only see the magical revelation takes online -- and I see a lot of them.

reply

upvote

by dan-robertson1 hours ago|

[-]

I think it’s just very alien in that things which tend to be correlated in humans may not be so correlated in LLMs. So two things that we expect people to be similarly good at end up being very different in an AI.

It does also seem to me that there is a lot of variance in skills for prompting/using AI in general (I say this as someone who is not particularly good as far as I’m aware – I’m not trying to keep tips secret from you). And there is also a lot of variance in the ability for an AI to solve problem of equal difficulty for a human.

reply

upvote

by LogicFailsMe2 hours ago|

[-]

I find these agents incredibly useful for eliminating time spent on writing utility scripts for data analysis or data transformation. But... I like coding, getting relegated to being a manager 100%? Sounds like a prison to me not freedom.

That they are so good at the things I like to do the least and still terrible at the things at which I excel. That's just gravy.

But I guess this is in line with how most engineers transition to management sometime in their 30s.

reply

upvote

by peab2 hours ago|

[-]

> ... but also seemingly has nothing to show for it This x1000, I find it so ridiculous.

usually when someone hypes it up it's things like, "i have it text my gf good morning every day!!", or "it analyzed every single document on my computer and wrote me a poem!!"

reply

upvote

by 2 hours ago|

[-]

deleted

reply

upvote

by nevster54 minutes ago|

[-]

I think the main thing is, these are all green fields projects. (Note original author talking about executing ideas for projects.)

reply

upvote

by 2 hours ago|

[-]

deleted

reply

upvote

by AstroBen1 hours ago|

[-]

We're at the apex of the hype cycle. I think it'll die down in a year and we'll get a better picture of how people have integrated the tools

Even if it's not straight astroturfing I think people are wowed and excited and not analyzing it with a clear head

reply

upvote

by chrisjj2 hours ago|

[-]

> if it were really as good as a lot of people are saying there should be a massive increase in the number of high quality projects/products being developed.

The headline gain is speed. Almost no-one's talking about quality - they're moving too fast to notice the lack.

reply

upvote

by mark_l_watson1 hours ago|

[-]

“Emperor wore no clothes” moment.

Given time AI will lead to incredible productivity. In the meantime, use as appropriate.

reply

upvote

by daliusd2 hours ago|

[-]

Maybe it is language specific? Maybe LLMs have a lot of good JavaScript/TypeScript samples for training and it works for those devs (e.g. me). I heard that Scala devs have problems with LLMs writing code too. I am puzzled by good devs not managing to get LLM work for them.

reply

upvote

by mbfg2 minutes ago|

[-]

I definitely think it's language specific. My history may deceive me here, but i believe that LLMs are infinitely better at pumping out python scripts than java. Now i have much, much more experience with java than python, so maybe it's just a case of what you don't know.... However, The tools it writes in python just work for me, and i can incrementally improve them and the tools get rationally better and more aligned with what i want.

I then ask it to do the same thing in java, and it spends a half hour trying to do the same job and gets caught in some bit of trivia around how to convert html escape characters, for instance, s.replace("<", "<").replace(">", ">").replace("\"").replace("""); as an example and endlessly compiles and fails over and over again, never able to figure out what it has done wrong, nor decides to give up on the minutia and continue with the more important parts.

reply

upvote

by cogman102 hours ago|

[-]

I think LLMs have a hard time with large code bases (obviously so do devs).

A giant monorepo would be a bad fit for an LLM IMO.

reply

upvote

by farco1243 minutes ago|

[-]

With agentic search, they actually do pretty well with monorepos.

reply

upvote

by gchamonlive1 hours ago|

[-]

I'm curious what types of tasks you were delegating to the coding agents?

reply

upvote

by deterministic29 minutes ago|

[-]

Completely agree. However I do get some productivity boost by using ChatGPT as an improved Google search able to customize the answer to what I need.

reply

upvote

by hawkernews1 hours ago|

[-]

I remember when Anthropic was running their Built with Claude contest on reddit. The submissions were few and let's just say less than impressive. I use Claude Code and am very pro-AI in general, but the deeper you go, the more glaring the limitations become. I could write an essay about it, but I feel like there's no point in this day and age, where floods of slop in fractured echo chambers dominate.

reply

upvote

by philipwhiuk3 hours ago|

[-]

It's like CGP Grey hosting a productivity podcast despite his productivity almost certainly going down over time.

It's the appearance of productivity, not actual productivity.

reply

upvote

by rafabulsing1 hours ago|

[-]

I always find that characterization of Grey and the Cortex podcast to be weird. He never claims to be a productivity master or the most productive person around. Quite the opposite, he has said multiple times how much he is not naturally productive, and how he actually kinda dislikes working in general. The systems and habits are the ways he found to essentially trick himself into working.

Which I think is what people gather from him, but somehow think he's hiding it or pretending is not the case? Which I find strange, given how openly he's talked about it.

As for his productivity going down over time, I think that's a combination of his videos getting bigger scopes and production values, and also he moving some of his time into some not so publicly visible ventures. E.g., he was one of the founders of Standard, which eventually became the Nebula streaming service (though he left quite a while ago now).

reply

upvote

by rulerviper50 minutes ago|

[-]

[dead]

reply

upvote

by fragmede1 hours ago|

[-]

The crazy pills you are taking is that thinking people have anything to prove to you. The C compiler that Anthropic created or whatever verb your want to use should prove that Claude is capable of doing reasonably complex level of making software. The problem is people have egos, myself included. Not in the inflated sense, but in the "I built a thing a now the Internet is shitting on me and I feel bad" sense. There's fundcli and nitpick on my GitHub that I created using Claude. fundcli looks at your shell history and suggests places to donate to, to support open source software you actually use. Nitpick is a TUI HN client. I've shipped others. The obvious retort is that those two things aren't "real" software; they're not complex, they're not making me any money. In fact, fundcli is costing me piles of money! As much as I can give it! I don't need anyone to tell me that or shit on the stuff I'm building.

The "open secret" is that shipping stuff is hard. Who hasn't bought a domain name for a side project that didn't go anywhere. If there's anybody out there, raise your hand! So there's another filtering effect.

The crazy pills are thinking that HN is in any way representative of anything about what's going on in our broader society. Those projects are out there, why do you assume you'll be told about it? That someone's going to write an exposé/blog post on themselves about how they had AI build a thing and now they're raking in the dollars and oh, buy my course on learning how to vibecode? The people selling those courses aren't the ones shipping software!

reply

upvote

by aeldidi1 hours ago|

[-]

> The C compiler that Anthropic created or whatever verb your want to use should prove that Claude is capable of doing reasonably complex level of making software.

I don't doubt that an LLM would theoretically be capable of doing these sorts of things, nor did I intend to give off that sentiment, rather I was more evaluating if it was as practical as some people seem to be making the case for. For example, a C compiler is very impressive, but its clear from the blog post[0] that this required a massive amount of effort setting things up and constant monitoring and working around limitations of Claude Code and whatnot, not to mention $20,000. That doesn't seem at all practical, and I wonder if Nicholas Carlini (the author of the Anthropic post) would have had more success using Claude Code alongside his own abilities for significantly cheaper. While it might seem like moving the goalpost, I don't think it's the same thing to compare what I was saying with the fact that a multi billion dollar corporation whose entire business model relies on it can vibe code a C compiler with $20,000 worth of tokens.

> The problem is people have egos, myself included. Not in the inflated sense, but in the "I built a thing a now the Internet is shitting on me and I feel bad" sense.

Yes, this is actually a good point. I do feel like there's a self report bias at play here when it comes to this too. For example, someone might feel like they're more productive, but their output is roughly the same as what it was pre-LLM tooling. This is kind of where I'm at right now with this whole thing.

> The "open secret" is that shipping stuff is hard. Who hasn't bought a domain name for a side project that didn't go anywhere. If there's anybody out there, raise your hand! So there's another filtering effect.

My hand is definitely up here, shipping is very hard! I would also agree that it's an "open secret", especially given that "buying a domain name for a side project that never goes anywhere" is such a universal experience.

I think both things can be true though. It can be true that these tools are definitely a step up from traditional IDE-style tooling, while also being true that they are not nearly as good as some would have you believe. I appreciate the insight, thanks for replying.

[0]: https://www.anthropic.com/engineering/building-c-compiler

reply

upvote

by v1ne1 hours ago|

[-]

If people make extraordinary claims, I expect extraordinary proofs…

Also, there is nothing complex in a C compiler. As students we built these things as toy projects at uni, without any knowledge of software development practices.

Yet, to bring an example for something that's more than a toy project: 1 person coded this video editor with AI help: https://github.com/Sportinger/MasterSelects

reply

upvote

by yojat6611 hours ago|

[-]

From the linked project:

> The reality: 3 weeks in, ~50 hours of coding, and I'm mass-producing features faster than I can stabilize them. Things break. A lot. But when it works, it works.

reply

upvote

by g-mork2 hours ago|

[-]

> it's incredibly obvious that while these tools are surprisingly good at doing repetitive or locally-scoped tasks, they immediately fall apart when faced with the types of things that are actually difficult in software development and require non-trivial amounts of guidance and hand-holding to get things right

I used this line for a long time, but you could just as easily say the same thing for a typical engineer. It basically boils down to "Claude likes its tickets to be well thought out". I'm sure there is some size of project where its ability to navigate the codebase starts to break down, but I've fed it sizeable ones and so long as the scope is constrained it generally just works nowadays

reply

upvote

by geetee2 hours ago|

[-]

The difference is a real engineer will say "hey I need more information to give you decent output." And when the AI does do that, congrats, the time you spend identifying and explaining the complexity _is_ the hard time consuming work. The code is trivial once you figure out the rest. The time savings are fake.

reply

upvote

by chrisjj2 hours ago|

[-]

That real engineer knows decent. This parrot knows only its own best (current attempt).

reply