undefined

points

[-]

Many devs still think their job is to write code not build products their business needs. I use LLMs extensively and it’s helped me work better faster.

by grugagag353 days ago|

parent|

[-]

LLMs excel at some things and work very poorly at others. People working on different problems have had different experiences, sometimes opposite ends of the spectrum.

by microtonal352 days ago|

parent|

[-]

I think the people who claim 10x-100x productivity improvements are working on tasks where LLMs work really well. There is a lot of development work out there that is relatively simple CRUD and LLMs are very good at it. On the complete opposite end we have designing new algorithms/data structures or extending them in a novel way. Or implementing drivers for new hardware from incomplete specs. LLMs do not do well on these tasks or even slow down developers 10x.

So, I think the claims of improvement in productivity and regression in productivity can be true at the same time (and it's not just that people who don't find using LLMs productive are just prompting them wrong).

I think most can be gained by learning in which areas LLMs can give large productivity boosts and where it's better to avoid using them. Of course, this is a continuous process, given that LLMs are still getting better.

Personally, I am quite happy with LLMs. They cannot replace me, but they can do a chunk of the boring/repetitive work (e.g. boilerplate), so as a result I can focus on the interesting problems. As long as we don't have human-like performance (and I don't feel like we are close yet), LLMs make programming more interesting.

They are also a great learning aid. E.g., this morning I wanted to make a 3D model for something I needed, but I don't know OpenSCAD. I iteratively made the design with Claude. At some point the problem becomes too difficult for Claude, but with the code generated at that point, I have learned enough about OpenSCAD that I can fix the more difficult parts of the project. The project would have taken me a few hours (to learn the language, etc.), but now I was done in 30 minutes and learned some OpenSCAD in a pleasant way.

by kaycebasques352 days ago|

parent|

[-]

Your OpenSCAD experience is an important point in the productivity debates that is often not discussed. A lot of projects that were previously impossible are now feasible. 10 years ago, you might have searched the OpenSCAD docs, watched videos, felt like it was impossible to find the info you needed, and given up. Claude and similar tools have gotten me past that initial blocker many times. Finding a way to unblock 0 to 1 productivity is perhaps as important (or maybe even more important than) as enabling 1 to 10 or 1 to 100.

by iteria352 days ago|

parent|

prev|

[-]

You don't even need such fancy examples. There are plenty of codebases where people are working with code that is over a decade old and has several paradigms all intermixed with a lot of tribal knowledge that isn't documented in code or wiki. That is where AI sucks. It will not be able to make meaningfully change in that environment.

There is also the frontend and tnpse code bases don't need to be very old at all before AI falls down. NPM packages and clashing styles in a codebase and AI has been not very helpful to me at all.

Generally speaking, which AI is a fine enhancement to autocomplete, I haven't seen it be able to do anything more serious in a mature codebase. The moment business rules and tech debt sneak in in any capacity, AI becomes so unreliable that it's faster to just write it yourself. If I can't trust the AI to automatically generate a list of exports in an index.ts file. What can I trust it for?

by simonw352 days ago|

parent|

[-]

When is the last time you tried using LLMs against a large, old, crufty undocumented codebase?

Things have changed a lot in the past six weeks.

Gemini 2.5 Pro accepts a million tokens and can "reason" with them, which means you can feed it hundreds of thousands of lines of code and it has a surprisingly good chance of figuring things out.

OpenAI released their first million token models with the GPT 4.1 series.

OpenAI o3 and o4-mini are both very strong reasoning code models with 200,000 token input limits.

These models are all new within the last six weeks. They're very, very good at working with large amounts of crufty undocumented code.

by grugagag352 days ago|

parent|

[-]

Ultimately LLMs don’t really understand what the code does at runtime. Sure, just parsing out the codebase can help make a good guess but in some cases it’s hard to trust LLMs with changes because the consequences are unknown in complex codebases that have weird warts nobody documented.

Maybe in a generation or two codebases will become more uniform and predictible if fewer humans do it by hand. Same with self driving cars, if there were no human drivers out there the problem would become trivial to conquer.

by simonw352 days ago|

parent|

[-]

That's a lot less true today than it was six weeks ago. The "reasoning" models are spookily good at answering questions about how code runs, and identifying the source of bugs.

They still make mistakes, and yeah they're still (mostly) next token predicting machines under the hood, but if your mental model is "they can't actually predict through how some code will execute" you may need to update that.

by 351 days ago|

parent|

prev|

[-]

deleted

by LunaSea351 days ago|

parent|

prev|

[-]

Gemini 2.5 Pro crashes with a 50) status code every 5 requests. Not great for a model you're supposed to rely on.

by simonw351 days ago|

parent|

[-]

Yeah, there's a reason it still has "preview" and "experimental" in the model names.

by kaycebasques352 days ago|

prev|

[-]

> hearing the LLM give good answers about its purpose and implementation from a completely cold read

Cold read ability for this particular tool is still an open question. As others have mentioned, a lot of the example tutorials are for very popular codebases that are probably well-represented in the language model's training data. I'm personally going to test it on my private, undocumented repos.

by tossandthrow353 days ago|

prev|

[-]

> Even if the LLM never writes a line of code - this is still valuable, because helping humans understand software faster means you can help humans write software faster.

IMHO, Ai text additions are generally not valuable and I assume, until proven wrong, that Ai text provides little to no value.

I have seen so many startups fold after they made some ai product that on the surface level appeared impressive but provided no substantial value.

Now, I will be impressed by the ai that can remove code without affecting the product.

by jonahx353 days ago|

parent|

[-]

> Now, I will be impressed by the ai that can remove code without affecting the product.

Current AIs can already do this decently. With the usual caveats about possible mistakes/oversight.

by otabdeveloper4351 days ago|

prev|

[-]

Summarization is one thing LLM's can do well, yes. (That's not what this current hype cycle is selling though.)

by panny353 days ago|

prev|

[-]

>Answers like this are sort of what makes me wonder what most engineers are smoking when they think AI isn’t valuable.

I'll just wait for a winner to shake out and learn that one. I've gotten tired of trying AIs only to get slop.

by CodeMage353 days ago|

prev|

[-]

> Answers like this are sort of what makes me wonder what most engineers are smoking when they think AI isn’t valuable.

Honestly, I wonder if I'm living in some parallel universe, because my experience is that "most engineers" are far from that position. The reactions I'm seeing are either "AI is the future" or "I have serious objections to and/or problems with AI".

If you're calling the latter group "the outright dismissal of AI", I would disagree. If I had to call it the outright dismissal of anything, it would be of AI hype.

> I also suspect people who level these criticisms have never really used a frontier LLM.

It's possible. At my workplace, we did a trial of an LLM-based bot that would generate summaries for our GitHub PRs. I have no idea whether it's a "frontier" LLM or not, but I came out of that trial equally impressed, disappointed, and terrified.

Impressed, because its summaries got so many details right. I could immediately see the use for a tool like that: even when the PR author provides a summary of the PR, it's often hard to figure out where to start looking at the PR and in which order to go through changes. The bulleted list of changes from the bot's summary was incredibly useful, especially because it was almost always correct.

Disappointed, because it would often get the most important thing wrong. For the very first PR that I made, it got the whole list of changes right, but the explanation of what the PR did was the opposite of the truth. I made a change to make certain behavior disabled by default and added an option to enable it for testing purposes, and the bot claimed that the behavior was impossible before this change and the PR made it possible if you used this option.

Terrified, because I can see how alluring it is for people to think that they can replace critical thinking with AI. Maybe it's my borderline burnout speaking, but I can easily imagine the future where the pressure from above to be more "efficient" and to reduce costs brings us to the point where we start trusting faulty AI and the small mistakes start accumulating to the point where great damage is done to millions of people.

> Even if the LLM never writes a line of code - this is still valuable, because helping humans understand software faster means you can help humans write software faster.

I have my doubts about this. Yes, if we get an AI that is reliable and doesn't make these mistakes, it can help us understand software faster, as long as we're willing to make the effort to actually understand it, rather than delegating to the AI's understanding.

What I mean by that is that there are different levels of understanding. How deep do you dive before you decide it's "deep enough" and trust what the AI said? This is even more important if you start also using the AI to write the code and not just read it. Now you have even less motivation to understand the code, because you don't have to learn something that you will use to write your own code.

I'll keep learning how to use LLMs, because it's necessary, but I'm very worried about what we seem to want from them. I can't think of any previous technological advance that aimed to replace human critical thinking and creativity. Why are we even pursuing efficiency if it isn't to give us more time and freedom to be creative?

by doug_durham353 days ago|

parent|

[-]

The value is that it got the details correct as you admit. That alone is worth the price of admission. Even if I need to rewrite or edit parts it has saved me time, and has raised the quality of PRs being submitted across the board. The key point with these tools is *Accountability*. As an engineer you are still accountable for your work. Using any tool doesn't take that away. If the PR tool gets it wrong, and you still submit it, that on the engineer. If you have a culture of accountability, then there is nothing to be terrified of. Any by the way the most recent tools are really, really good at PRs and commit messages.

by svieira352 days ago|

parent|

[-]

Are you accountable for CPU bugs in new machines added to your Kubernetes fleet? The trusting-trust problem only works if there is someone to trust.

by voidUpdate352 days ago|

prev|

[-]

Well companies lock "frontier LLMs" behind paywalls, and I don't want to pay for something that still might not be of any use to me

by GaggiX352 days ago|

parent|

[-]

Gemini 2.5 Pro Experimental (a frontier model) has 5 RPM and 25 RPD.

Gemini 2.5 Flash Preview 04-17 another powerful model has 10 and 500.

OpenAI also allows you to use their API for free if you agree to share the tokens.

by voidUpdate352 days ago|

parent|

[-]

What are "RPM" and "RPD"? I assume not Revolutions Per Minute?

by GaggiX352 days ago|

parent|

[-]

Requests