undefined

upvote

points

by anonu19 hours ago |

upvote

by ok_dad19 hours ago|

[-]

Why are you letting the LLM drive? Don't turn on auto-approve, approve every command the agent runs. Don't let it make design or architecture decisions, you choose how it is built and you TELL that clanker what's what! No joke, if you treat the AI like a tool then you'll get more mileage out of it. You won't get 10x gains, but you will still understand the code.

reply

upvote

by soiltype19 hours ago|

[-]

Personally I've found "carefully review every move it makes" to be an extremely unpleasant and difficult workflow. The effort needed to parse every action is immense, but there's a complete absence of creative engagement - no chance of flow state. Just the worst kind of work which I've been unable to sustain, unfortunately. At this point I mostly still do work by hand.

reply

upvote

by est3117 hours ago|

[-]

It's unpleasant for me at normal speed settings, but on fast mode it works really well: the AI does changes quickly enough for me to stay focused.

Of course this requires being fortunate enough that you have one of those AI positive employers where you can spend lots of money on clankers.

I don't review every move it makes, I rather have a workflow where I first ask it questions about the code, and it looks around and explores various design choices. then i nudge it towards the design choice I think is best, etc. That asking around about the code also loads up the context in the appropriate manner so that the AI knows how to do the change well.

It's a me in the loop workflow but that prevents a lot of bugs, makes me aware of the design choices, and thanks to fast mode, it is more pleasant and much faster than me manually doing it.

reply

upvote

by lambda17 hours ago|

[-]

This is my biggest problem with the promises of agentic coding (well, there are an awful lot of problems, but this is the biggest one from an immediate practical perspective).

One the one hand, reviewing and micromaning everything it does is tedious and unrewarding. Unlike reviewing a colleague's code, you're never going to teach it anything; maybe you'll get some skills out of it if you finds something that comes up often enough it's worth writing a skill for. And this only gets you, at best, a slight speedup over writing it yourself, as you have to stay engaged and think about everything that's going on.

Or you can just let it grind away agentically and only test the final output. This allows you to get those huge gains at first, but it can easily just start accumulating more and more cruft and bad design decisions and hacks on top of hacks. And you increasingly don't know what it's doing or why, you're losing the skill of even being able to because you're not exercising it.

You're just building yourself a huge pile of technical debt. You might delete your prod database without realizing it. You might end up with an auth system that doesn't actually check the auth and so someone can just set a username of an admin in a cookie to log in. Or whatever; you have no idea, and even if the model gets it right 95% of the time, do you want to be periodically rolling a d20 and if you get a 1 you lose everything?

reply

upvote

by JoshTriplett19 hours ago|

[-]

I agree, but I also think that giving the LLM free rein is also extremely unpleasant and difficult. And you still need to review the resulting code.

reply

upvote

by soiltype17 hours ago|

[-]

I don't think there's anything difficult or unpleasant about the process of letting the LLM run free, that's the whole point, it's nearly frictionless. Which includes not reviewing the code carefully. You say "need" but you mean "ought".

reply

upvote

by JoshTriplett17 hours ago|

[-]

Friction is not the only source of displeasure. I've tried out vibe-coding for something non-trivial; I found it deeply unpleasant.

reply

upvote

by tabwidth15 hours ago|

[-]

Reviewing isn't hard when the diff is what you asked for. It's when you asked for a one-line fix and get back 40 changed lines across four files. At that point you're not even reviewing your change anymore, you're auditing theirs.

reply

upvote

by Lihh2718 hours ago|

[-]

That's the trap though. The moment you approve every step, you're no longer getting the product that was sold to you. You're doing code review on a stochastic intern. The whole 10x story depends on you eventually looking away.

reply

upvote

by ok_dad16 hours ago|

[-]

Just don’t buy the tools for 10x improvements, buy them for the 1.1x improvement and the help it gives with the annoying stuff like refactoring arguments to a function that’s used all over, writing tests, etc. They can also help reduce cognitive load in certain ways when you just use them to ask about your large code base.

reply

upvote

by threatofrain18 hours ago|

[-]

Because the degree to which the LLM prompts you back to the terminal is too frequent for the human to engage in parallel work.

reply

upvote

by ok_dad16 hours ago|

[-]

I’m basically saying don’t do parallel work, use it as a tool. Just sit there and watch it do stuff, make sure it’s doing what you want, and stop it if it’s doing too much or not what you want to do.

Maybe I’m just weird (actually that’s a given) but I don’t mind babysitting the clanker while it works.

reply

upvote

by sornaensis16 hours ago|

[-]

I define tools that perform individual tasks, like build the application, run the tests, access project management tools with task context, web search, edit files in the workspace, read only vs write access source control, etc.

The agent only has access to exactly what it needs, be it an implementation agent, analysis agent, or review agent.

Makes it very easy to stay in command without having to sit and approve tons of random things the agent wants to do.

I do not allow bash or any kind of shell. I don't want to have to figure out what some random python script it's made up is supposed to do all the time.

reply

upvote

by ok_dad16 hours ago|

[-]

This is a cool idea, can you write more about how your tools work or maybe short descriptions of a few of them? I’m interested in more rails for my bots.

reply

upvote

by sornaensis16 hours ago|

[-]

I just made MCP servers that wrap the tools I need the agents to use, and give no-ask permissions to the specific tools the agents need in the agent definition.

Both OpenCode and VsCode support this. I think in ClaudeCode you can do it with skills now.

The other benefit is the MCP tool can mediate e.g. noisy build tool output, and reduce token usage by only showing errors or test failures, nothing else, or simply an ok response with the build run or test count.

So far, I have not needed to give them access to more than build tools, git, and a project/knowledge system (e.g. Obsidian) for the work I have them doing. Well and file read/write and web search.

reply

upvote

by ok_dad11 hours ago|

[-]

Cool, thanks for the additional details!

I use Cursor but it's getting expensive lately, so I'm trying to reduce context size and move to OpenCode or something like that which I can use with some cheaper provider and Kimi 2.5 or whatever.

reply

upvote

by andoando19 hours ago|

[-]

Because its SO much faster not to have to do all that. I think 10x is no joke, and if you're doing MVP, its just not worth the mental effort.

reply

upvote

by pron17 hours ago|

[-]

POC, sure (although 10x-ing a POC doesn't actually get you 10x velocity). MVP, though? No way. Today's frontier models are nowhere near smart enough to write a non-trivial product (i.e. something that others are meant to use), minimal or otherwise, without careful supervision. Anthropic weren't able to get agents to write even a usable C compiler (not a huge deal to begin with), even with a practically infeasible amount of preparatory work (write a full spec and a reference implementation, train the model on them as well as on relevant textbooks, write thousands of tests). The agents just make too many critical architectural mistakes that pretty much guarantee you won't be able to evolve the product for long, with or without their help. The software they write has an evolution horizon between zero days and about a year, after which the codebase is effectively bricked.

reply

upvote

by andoando16 hours ago|

[-]

There is a million things in between a C compiler and a non-trivial product. They do make a ton of horrible architectural decisions, but I only need to review the output/ask questions to guide that, not review every diff.

reply

upvote

by pron15 hours ago|

[-]

A C compiler is a 10-50KLOC job, which the agents bricked in 0 days despite a full spec and thousands of hand-written tests, tests that the software passed until it collapsed beyond saving. Yes, smaller products will survive longer, but how would you know about the time bombs that agents like hiding in their code without looking? When I review the diffs I see things that, if had let in, the codebase would have died in 6-18 months.

BTW, one tip is to look at the size of the codebase. When you see 100KLOC for a first draft of a C compiler, you know something has gone horribly wrong. I would suggest that you at least compare the number of lines the agent produced to what you think the project should take. If it's more than double, the code is in serious, serious trouble. If it's in the <1.5x range, there's a chance it could be saved.

Asking the agent questions is good - as an aid to a review, not as a substitute. The agents lie with a high enough frequency to be a serious problem.

The models don't yet write code anywhere near human quality, so they require much closer supervision than a human programmer.

reply

upvote

by sarchertech15 hours ago|

[-]

A C compiler with an existing C compiler as oracle, existing C compilers in the training set, and a formal spec, is already the easiest possible non-trivial product an agent could build without human review.

You could have it build something that takes fewer lines of code, but you aren’t gonna to find much with that level of specification and guardrails.

reply

upvote

by 19 hours ago|

[-]

deleted

reply

upvote

by applfanboysbgon19 hours ago|

[-]

This is significantly slower than just writing the code yourself.

reply

upvote

by ok_dad16 hours ago|

[-]

I don’t find it slower overall, personally, but YMMV depending on how you like to tackle problems. Also the problem space and the project details can dictate that these tools aren’t helpful. Luckily the code I write tends to be perfect for a coding agent to clank away for me.

reply

upvote

by sdevonoes17 hours ago|

[-]

For that kind of flow, I prefer to work without AI.

reply

upvote

by ok_dad16 hours ago|

[-]

The agent mostly helps me reduce cognitive load and avoid the fiddly bits. I still review and understand all of the code but I don’t have to think about writing all of it. I also still hand write tons of code when I want to be very specific about behavior.

reply

upvote

by giraffe_lady19 hours ago|

[-]

I agree with this too. I decided on constraints for myself around these tools and I give my complete focus & attention to every prompt, often stopping for minutes to figure things through and make decisions myself. Reviewing every line they produce. I'm a senior dev with a lot of experience with pair programming and code review, and I treat its output just as I would those tasks.

It has about doubled my development pace. An absolutely incredible gain in a vacuum, though tiny compared to what people seem to manage without these self-constraints. But in exchange, my understanding of the code is as comprehensive as if I had paired on it, or merged a direct report's branch into a project I was responsible for. A reasonable enough tradeoff, for me.

reply

upvote

by arjie18 hours ago|

[-]

I have never found any utility in that. After all, you can still just review the diffs and ask it for explanation for sections instead.

reply

upvote

by pavel_lishin18 hours ago|

[-]

> After all, you can still just review the diffs

anonu has explicitly said that they've wiped a database twice as a result of agents doing stuff. What sort of diff would help against an agent running commands, without your approval?

reply

upvote

by arjie14 hours ago|

[-]

Agent does not have to run in your user context. It is easy mistake to make in yolo mode but after that it's easy to fix. e.g. this is what I use now so I can release agent from my machine and also constrain its access:

    $ main-app git:(main) kubectl get pods | grep agent | head -n 1 | sed -E 's/[a-z]+-agent(.*)/app-agent\1/'
    app-agent-656c6ff85d-p86t8                          1/1     Running     0             13d

Agent is fully capable of making PR etc. if you provide appropriate tooling. It wipes DB but DB is just separate ephemeral pod. One day perhaps it will find 0-day and break out, but so far it has not done it.

reply

upvote

by exe3417 hours ago|

[-]

Hah I run my agent inside a docker with just the code. Anything clever it tries to do just goes nowhere.

reply

upvote

by ModernMech18 hours ago|

[-]

> After all, you can still just review the diffs

The diff: +8000 -4000

reply

upvote

by arjie13 hours ago|

[-]

You can ask it to make the changes in appropriate PRs. SOTA model + harness can do it. I find it useful to separate refactors and implementations, just like with humans, but I admittedly rely heavily on multi-provider review.

reply

upvote

by wahnfrieden19 hours ago|

[-]

It’s terribly slow

reply

upvote

by ok_dad16 hours ago|

[-]

I get it, but if tomorrow every inference provider doubled costs I still understand my applications code and can continue to work on it myself.

reply

upvote

by wahnfrieden15 hours ago|

[-]

I hear this a lot but I don't think decades of experience atrophies irretrievably so quickly as to make it worth it (alone) to abstain from making full use of these tools. I still read and direct enough of the architecture to not be lost in the code it generates. Maybe you haven't tried using agents to reorganize/refactor as much - I have cleaner code than I did before when it was done by hand, because I can afford to tackle debts.

I also don't find the permissions it prompts for very meaningful. Permission to use a file search tool? Permission to make a web request? It's a clumsy way to slow it down enough for me to catch up.

reply

upvote

by esafak18 hours ago|

[-]

You can push thousands of LOC every day while approving manually. If you went any faster you would not be able to read the code.

reply

upvote

by harikb19 hours ago|

[-]

On the credentials point. Here is what I find.

Day 1: Carefully handles the creds, gives me a lecture (without asking) about why .env should be in .gitignore and why I should edit .env and not hand over the creds to it.

Day 2: I ask for a repeat, has lost track of that skill or setting, frantically searches my entire disk, reads .env including many other files, understands that it is holding a token, manually creates curl commands to test the token and then comes back with some result.

It is like it is a security expert on Day 1 and absolute mediocre intern on Day 2

reply

upvote

by eterm19 hours ago|

[-]

I found the same, it was super careful handling the environment variable until it hit an API error, and I caught in it's thinking "Let me check the token is actually set correctly" and it just echoed the token out.

( This was low-stakes test creds anyway which I was testing with thankfully. )

I never pass creds via env or anything else it can access now.

My approach now is to get it to write me linqpad scripts, which has a utility function to get creds out of a user-encrypted share, or prompts if it's not in the store.

This works well, but requires me to run the scripts and guide it.

Ultimately, fully autotonous isn't compatible with secrets. Otherwise, if it really wanted to inspect it, then it could just redirect the request to an echo service.

The only real way is to deal with it the same way we deal with insider threat.

A proxy layer / secondary auth, which injects the real credentials. Then give claude it's own user within that auth system, so it owns those creds. Now responsibilty can be delegated to it without exposing the original credentials.

That's a lot of work when you're just exploring an API or DB or similar.

reply

upvote

by jbreckmckye16 hours ago|

[-]

I think it is just because they are having to load shed! Some days you may be getting much less compute - the main way "thinking" operates, is to just iterate on the result a few more times

reply

upvote

by boron100618 hours ago|

[-]

I essentially have 3 modes:

1. Everything is specified, written and tested by me, then cleaned up by AI. This is for the core of the application.

2. AI writes the functions, then sets up stub tests for me to write. Here I’ll often rewrite the functions as they often don’t do what I want, or do too much. I just find it gets rid of a lot of boilerplate to do things this way.

3. AI does everything. This is for experiments or parts of an application that I am perfectly willing to delete. About 70% of the time I do end up deleting these parts. I don’t allow it to touch 1 or 2.

Of course this requires that the architecture is setup in a way where this is possible. But I find it pretty nice.

reply

upvote

by suzzer999 hours ago|

[-]

This is the way imo, at least for now.

reply

upvote

by stult18 hours ago|

[-]

This seems like a really easy problem to solve. Just don't give the LLM access to any prod credentials[1]. If you can't repro a problem locally or in staging/dev environments, you need to update your deployment infra so it more closely matches prod. If you can't scope permissions tightly enough to distinguish between environments, update your permissions system to support that. I've never had anything even vaguely resembling the problems you are describing because I follow this approach.

[1] except perhaps read-only credentials to help diagnose problems, but even then I would only issue it an extremely short-lived token in case it leaks it somehow

reply

upvote

by Barbing19 hours ago|

[-]

Must consider ourselves lucky for having the intuition to notice skill stagnation and atrophy.

Only helps if we listen to it :) which is fun b/c it means staying sharp which is inherently rewarding

reply

upvote

by lucasgerads19 hours ago|

[-]

I usually try to review all the code written by claude. And also let claude review all the code that i write. So, usually I have some understanding of what is going on. And Claude definitely sometimes makes "unconventional" decisions. But if you are working on a large code base with other team members (some of which may already have left the company). Their are also large parts of the code that one doesn't understand and are abstracted away.

reply

upvote

by stared17 hours ago|

[-]

I really recommend using Agent Safehouse (https://news.ycombinator.com/item?id=47301085).

Don’t give your agent access to content it should not edit, don’t give keys it shouldn’t use.

reply

upvote

by raincole19 hours ago|

[-]

It never ceases to scare me how they just run python code I didn't write via:

> python <<'EOF'

> ${code the agent wrote on the spot}

> EOF

I mean, yeah, in theory it's just as dangerous as running arbitrary shell commands, which the agent is already doing anyway, but still...

reply

upvote

by dns_snek18 hours ago|

[-]

The good news is that some of these harnesses (like Codex) use sandboxing. The bad news is that they're too inflexible to be effective.

By default these shell commands don't have network access or write access outside the project directory which is good, but nowhere near customizable enough. Once you approve a command because it needs network access, its other restrictions are lifted too. It's all or nothing.

reply

upvote

by onlyrealcuzzo19 hours ago|

[-]

> 2. I've learned nothing. So the cognitive load of doing it myself, even assembling a simple docker command, is just too high. Thus, I repeatedly fallback to the "crutch" of using AI.

I'm not trying to be offense, so with all due respect... this sounds like a "you" problem. (And I've been there, too)

You can ask the LLMs: how do I run this, how do I know this is working, etc etc.

Sure... if you really know nothing or you put close to zero effort into critically thinking about what they give you, you can be fooled by their answers and mistake complete irrelevance or bullshit for evidence that something works is suitably tested to prove that it works, etc.

You can ask 2 or 3 other LLMs: check their work, is this conclusive, can you find any bugs, etc etc.

But you don't sound like you know nothing. You sound like you're rushing to get things done, cutting corners, and you're getting rushed results.

What do you expect?

Their work is cheap. They can pump out $50k+ worth of features in a $200/mo subscription with minimal baby-sitting. Be EAGER to reject their work. Send it back to them over and over again to do it right, for architectural reviews, to check for correctness, performance, etc.

They are not expensive people with feelings you need to consider in review, that might quit and be hard to replace. Don't let them cut corners. For whatever reason, they are EAGER to cut corners no matter how much you tell them not to.

reply

upvote

by devilsdata15 hours ago|

[-]

Good advice. Personally I'm waiting until it is worthwhile to run these models locally, then I'm going to pin a version and just use that.

I'm only 5 years into this career, and I'm going to work manually and absorb as much knowledge as possible while I'm still able to do it. Yes, that means manually doing shit-kicker work. If AI does get so good that I need to use it, as you say, then I'll be running it locally on a version I can master and build tooling for.

reply

upvote

by cortesoft19 hours ago|

[-]

While I share some of the feelings about 'not understanding what is actually happening under the hood', I can't help but think about how this feeling is the exact same response that programmers had when compilers were invented:

https://vivekhaldar.com/articles/when-compilers-were-the--ai...

We are completely comfortable now letting the compilers do their thing, and never seem to worry that we "don't know what is actually happening under the hood".

I am not saying these situations are exactly analogous, but I am saying that I don't think we can know yet if this will be one of those things that we stop worrying about or it will be a serious concern for a while.

reply

upvote

by msteffen19 hours ago|

[-]

I think about this a lot, though one paragraph from that article:

> Many assembly programmers were accustomed to having intimate control over memory and CPU instructions. Surrendering this control to a compiler felt risky. There was a sentiment of, if I don’t code it down to the metal, how can I trust what’s happening? In some cases, this was about efficiency. In other cases, it was about debuggability and understanding programming behavior. However, as compilers matured, they began providing diagnostic output and listings that actually improved understanding.

I would 100% use LLMs more and more aggressively if they were more transparent. All my reservations come from times when I prompt “change this one thing” and it rewrites my db schema for some reason, or adds a comment that is actively wrong in several ways. I also think I have a decent working understanding of the assembly my code compiles to, and do occasionally use https://godbolt.org/. Of course, I didn’t start out that way, but I also don’t really have any objections to teenagers vibe-coding games, I just think at some point you have to look under the hood if you’re serious.

reply

upvote

by cortesoft19 hours ago|

[-]

> I would 100% use LLMs more and more aggressively if they were more transparent. All my reservations come from times when I prompt “change this one thing” and it rewrites my db schema for some reason, or adds a comment that is actively wrong in several ways.

Isn't that what git is for, though? Just have your LLM work in a branch, and then you will have a clear record of all the changes it made when you review the pull request.

reply

upvote

by ManuelKiessling19 hours ago|

[-]

(I‘m saying this as someone who uses AI for coding a lot and mostly love it) Yeah, but is that really the same? Compilers work deterministically — if it works once, it will work always. LLMs are a different story for now.

reply

upvote

by betenoire19 hours ago|

[-]

Said another way, compilers are a translation of existing formal code. Compilers don't add features, they don't create algorithms (unrolling, etc., notwithstanding), they are another expression of the same encoded solution.

LLMs are nothing like that

reply

upvote

by cortesoft19 hours ago|

[-]

LLMs are just translating text into output, too, and are running on deterministic computers like every other bit of code we run. They aren't magic.

It is just the scope that makes it appear non-deterministic to a human looking at it, and it is large enough to be impossible for a human to follow the entire deterministic chain, but that doesn't mean it isn't in the end a function that translates input data into output data in a deterministic way.

reply

upvote

by betenoire18 hours ago|

[-]

just text !== syntactically correct code that solves a defined problem

There is a world of difference between translation and generation. It's even in the name: generative AI. I didn't say anything about magic.

reply

upvote

by cortesoft19 hours ago|

[-]

LLMs are deterministic, too. I know there is randomness in the choosing tokens, but that randomness is derived from a random seed that can be repeated.

reply

upvote

by gpderetta5 hours ago|

[-]

LLMs are deterministic[1], but the only way to determine the output is to empirically run them. With compilers, both the implementor and a power user understand the specific code transformations they are capable of, so you can predict their output with good accuracy. I.e. LLMs are probably chaotic systems.

edit: there might be a future where we develop robopsychology enough to understand LLM more than black boxes, we we are not there yet.

[1] Aside from injected randomness and parallel scheduling artifacts.

reply

upvote

by Supermancho16 hours ago|

[-]

Only if the seed is known. Determinism is often predicated on perfect information. Many programs do not have that. Their operations cannot be reproduced practically. The difference between saying deterministic and non-deterministic is contextual based on if you are concerned with theory or practicality.

reply

upvote

by lelanthran18 hours ago|

[-]

If I understand your argument, you're saying that models can be deterministic, right?

Care to point to any that are set up to be deterministic?

Did you ever stop to think about why no one can get any use out of a model with temp set to zero?

reply

upvote

by mrob17 hours ago|

[-]

llama.cpp is deterministic when run with a specified PRNG seed, at least when running on CPU without caching. This is true regardless of temperature. But when people say "non-deterministic", they really mean something closer to "chaotic", i.e. the output can vary greatly with small changes to input, and there is no reliable way to predict when this will happen without running the full calculation. This is very different behavior from traditional compilers.

reply

upvote

by cortesoft18 hours ago|

[-]

No, LLMs ARE deterministic, just like all computer programs are.

I get why that is in practice different then the manner in which compilers are deterministic, but my point is the difference isnt because of determinism.

reply

upvote

by betenoire15 hours ago|

[-]

I think you are misunderstanding the term "deterministic". Running on deterministic hardware does not mean an algorithm is deterministic.

Create a program that reads from /dev/random (not urandom). It's not determistic.

reply

upvote

by cortesoft14 hours ago|

[-]

Fair, although you can absolutely use local LLMs in a deterministic way (by using fixed seeds for the random number generation), and my point is that even if you did that with your LLM, it wouldn't change the feeling someone has about not being able to reason out what was happening.

In other words, it isn't the random number part of LLMs that make them seem like a black box and unpredictable, but rather the complexity of the underlying model. Even if you ran it in a deterministic way, I don't think people would suddenly feel more confident about the outputted code.

reply

upvote

by nextaccountic18 hours ago|

[-]

The difference is that compilers are supposed to be deterministic and low level inclined people often investigate compiler bugs (specially performance bugs) and can pinpoint to some deterministic code that triggered it. Fix the underlying code and it stops misbehaving with high assurance

A non deterministic compiler is probably defective and in any case much less useful

reply

upvote

by mathieudombrock19 hours ago|

[-]

A major difference is that _someone_ knew what was going on (compiler devs).

reply

upvote

by cortesoft19 hours ago|

[-]

That is an interesting difference, I agree.

Although, while the compiler devs might know what was going on in the compiler, they wouldn't know what the compiler was doing with that particular bit of code that the FORTRAN developer was writing. They couldn't possibly foresee every possible code path that a developer might traverse with the code they wrote. In some ways, you could say LLMs are like that, too; the LLM developers know how the LLM code works, but they don't know the end result with all the training data and what it will do based on that.

In addition, to the end developer writing FORTRAN it was a black box either way. Sure, someone else knows how the compiler works, but not the developer.

reply

upvote

by lelanthran18 hours ago|

[-]

I think you have an incorrect mental model of how LLMs work.

There's plenty of resources online to rectify that, though.

reply

upvote

by cortesoft18 hours ago|

[-]

I think you may be misreading my comment, then, because I know how LLMs work. Which part of my comment do you think shows that I don’t?

reply

upvote

by gpderetta4 hours ago|

[-]

maybe you have a wrong mental model on how compiler works then. I'm not a compiler developer, but usually I have a pretty good idea on what code gcc will generate for my C++: it is far from a black box.

Also compilers usually compose well: you can test snippets of code in isolation and the generated code it will have at least some relation to whatever asm would be generated when the snippet is embedded in a larger code base (even under inter-procedural optimizations or LTO, you can predict and often control how it will affect the generated code).

reply

upvote

by mnkypete19 hours ago|

[-]

Except that compilers are (at least to a large degree) deterministic. It's complexity that you don't need to worry about. You don't need to review the generated assembly. You absolutely need to review AI generated code.

reply

upvote

by cortesoft19 hours ago|

[-]

At the end of the day, LLMs are also deterministic. They are running on computers just like all software, and if you have all the same data and random seeds, and you give the same prompt to the same LLM, you will get back the exact same response.

reply

upvote

by Supermancho16 hours ago|

[-]

> you give the same prompt to the same LLM, you will get back the exact same response.

Demonstrably incorrect. This is because the model selection, among other data, is not fixed for (I would say most) LLMs. They are constantly changing. I think you meant something more like an LLM with a fixed configuration. Maybe additional constraints, depending on the specific implementation.

reply

upvote

by cortesoft14 hours ago|

[-]

Yes, by 'same LLM', I mean literally the same model with the same random seeds. You are correct, the big LLMs from providers like Anthropic and OpenAI do not meet this definition.

reply

upvote

by ryandrake18 hours ago|

[-]

[dead]

reply