undefined

upvote

points

by logicprog8 hours ago |

upvote

by andai4 hours ago|

[-]

My Weird Hill is that we should be building things with GPT-4.

I can say unironically that we haven't even tapped the full potential of GPT-4. The original one, from 2023. With no reasoning, no RL, no tool calling, no structured outputs, etc. (No MCP, ye gods!) Yes, it's possible to build coding agents with it!

I say this because I did!

Forcing yourself to make things work with older models forces you to keep things simple. You don't need 50KB of prompts. You can make a coding agent with GPT-4 and half a page of prompt.

Now, why would we do this? Well, these constraints force you to think differently about the problem. Context management becomes non-optional. Semantic compression (for Python it's as simple as `grep -r def .`) becomes non-optional. Bloating the prompt with infinite detail and noise... you couldn't if you wanted to!

Well, surely none of this is relevant today? Well, it turns out all of it still is! e.g. small fix, the "grep def" (or your language's equivalent) can be trivially added as a startup hook to Claude Code, and suddenly it doesn't have to spend half your token budget poking around the codebase, because -- get this -- it can just see where everything is... (What a concept, right?)

-- We can also get into "If you let the LLM design the API then you don't need a prompt because it already knows how it should work", but... we can talk about that later ;)

reply

upvote

by jstummbillig2 hours ago|

[-]

The problem with these exercises is always: I have limited time and capacity to do things, and a fairly unlimited number of problems that I can think of to solve. Coding is not a problem I want to solve. Prompt engineering is not a problem I want to solve.

If I do things for the love if it, the rules are different of course. But otherwise I will simply always accept that there are many things that improve around me, that I have no intimate knowledge of and probably never will, and I let other people work them out and happily lean on their work to do the next thing I care about, that is not already solved.

reply

upvote

by logicprog3 hours ago|

[-]

> Well, surely none of this is relevant today? Well, it turns out all of it still is! e.g. small fix, the "grep def" (or your language's equivalent) can be trivially added as a startup hook to Claude Code, and suddenly it doesn't have to spend half your token budget poking around the codebase, because -- get this -- it can just see where everything is... (What a concept, right?)

Hahaha yeah. This is very true. I find myself making ad hoc versions of this in static markdown files to get around it. Just another example of the kind of low hanging fruit harnesses are leaving on the table. A version of this that uses tree sitter grammars to map a codebase, and does it on every startup of an agent, would be awesome.

> My Weird Hill is that we should be building things with GPT-4.

I disagree, IMO using the best models we have is a good way to avoid wasting time, but that doesn't mean we shouldn't also be frugal and clever with our harnesses!

reply

upvote

by andai3 hours ago|

[-]

To clarify, I didn't mean we should be using ancient models in production, I meant in R&D.

Anthropic says "do the simplest thing that works." If it works with the LLMs we had 3 years ago, doesn't that make it simpler?

The newer LLMs mostly seem to work around the poor system design. (Like spawning 50 subagents on a grep-spree because you forgot to tell it where anything is...) But then you get poor design in prod!

reply

upvote

by codazoda7 minutes ago|

[-]

Ive been working on Peen, a CLI that lets local Ollama models call tools effectively. It’s quite amateur, but I’ve been surprised how spending a few hours on prompting, and code to handle responses, can improve the outputs of small local models.

https://github.com/codazoda/peen

reply

upvote

by mycall7 hours ago|

[-]

If I remember, both Claude Code and OpenAI Codex "harnesses" improved themselves now.

OpenAI used early versions of GPT-5.3-Codex to: debug its own training process, manage its deployment and scaling and diagnose test results and evaluation data.

Claude Code have shipped 22 PRs in a single day and 27 the day before, with 100% of the code in each PR generated entirely by Claude Code.

reply

upvote

by logicprog8 hours ago|

[-]

Also, yes, I'm aware that I use a lot of "its not just X, its Y." I promise you this comment is entirely human written. I'm just really tired and tend to rely on more wrote rhetorical tropes when I am. Believe me, I wrote like this long before LLMs were a thing.

reply

upvote

by rubenflamshep8 hours ago|

[-]

It didn’t read as AI to me :)

reply

upvote

by drob5184 hours ago|

[-]

That's what all the AIs have been trained to say.

reply

upvote

by co_king_35 hours ago|

[-]

No one here will accuse you of being an AI unless they're trying to dehumanize you for expressing anti-AI sentiment.

reply

upvote

by logicprog3 hours ago|

[-]

I'm sorry, but that's empirically false. E.g., a substantial proportion of the highly upvoted comments on https://news.ycombinator.com/item?id=46953491, which was one of the best articles on software engineering I've read in a long time, are accusing it of being AI for no reason.

reply

upvote

by kachapopopow8 hours ago|

[-]

why the long -'s

reply

upvote

by logicprog7 hours ago|

[-]

Because I like them?

reply

upvote

by kachapopopow7 hours ago|

[-]

reminds me of that one guy complaining that everyone is calling them an AI when AI was trained on their grammar style.

reply

upvote

by ahofmann7 hours ago|

[-]

This happened to the female speaker with her voice, which I find terrifying: https://www.youtube.com/watch?v=qO0WvudbO04

reply

upvote

by soperj6 hours ago|

[-]

how do you make them?

reply

upvote

by RussianCow6 hours ago|

[-]

On macOS, Option+Shift+- and Option+- insert an em dash (—) and en dash (–), respectively. On Linux, you can hit the Compose Key and type --- (three hyphens) to get an em dash, or --. (hyphen hyphen period) for an en dash. Windows has some dumb incantation that you'll never remember.

reply

upvote

by oblio2 hours ago|

[-]

For Windows it's just easier to make a custom keyboard layout and go to town with that: https://www.microsoft.com/en-us/download/details.aspx?id=102...

reply

upvote

by BizarroLand4 hours ago|

[-]

Alt+0151 or WIN+SHIFT+-, but I can't seem to make the WIN+SHIFT+- combo work in browser, only in a text editor.

reply

upvote

by noupdates5 hours ago|

[-]

I was just looking at the SWE-bench docs and it seems like they use almost an arbitrary form of context engineering (loading in some arbitrary amount of files to saturate context). So in a way, the bench suites test how good a model is with little to no context engineering (I know ... it doesn't need to be said). We may not actually know which models are sensitive to good context-engineering, we're simply assuming all models are. I absolutely agree with you on one thing, there is definitely a ton of low hanging fruit.

reply

upvote

by barrenko7 hours ago|

[-]

2026 is the year of the harness.

reply

upvote

by visarga7 hours ago|

[-]

Already made a harness for Claude to make R/W plans, not write once like they are usually implemented. They can modify themselves as they work through the task at hand. Also relying on a collection of patterns for writing coding task plans which evolves by reflection. Everything is designed so I could run Claude in yolo-mode in a sandbox for long stretches of time.

reply

upvote

by porker2 hours ago|

[-]

Link?

reply

upvote

by ex-aws-dude3 hours ago|

[-]

As a VC in 2026 I'm going to be asking every company "but what's your harness strategy?"

reply

upvote

by kridsdale348 minutes ago|

[-]

Given that you're likely in San Francisco, make sure you say "AI Harness".

reply

upvote

by cyanydeez1 hours ago|

[-]

2027 is the year of the "maybe indeterminism isn't as valueable as we thought"

reply

upvote

by miohtama7 hours ago|

[-]

But will harness build desktop Linux for us?

reply

upvote

by vidarh3 hours ago|

[-]

My harness is improving my Linux desktop...

reply

upvote

by riskable6 hours ago|

[-]

Only if you put bells on it and sing Jingle Bells while it em dashes through the snow.

reply

upvote

by aeon_ai8 hours ago|

[-]

Once you begin to see the “model” as only part of the stack, you begin to realize that you can draw the line of the system to include the user as well.

That’s when the future really starts hitting you.

reply

upvote

by renato_shira4 hours ago|

[-]

yeah this clicked for me when i stopped obsessing over which model to use and focused on how i structure the context and feedback loops around it. for my project the same model went from "barely usable" to "legitimately helpful" just by changing how i fed it context and how i validated its output.

the user inclusion part is real too. the best results i get aren't from fully autonomous agents, they're from tight human-in-the-loop cycles where i'm steering in real time. the model does the heavy lifting, i do the architectural decisions and error correction. feels more like pair programming than automation.

reply

upvote

by logicprog3 hours ago|

[-]

> the user inclusion part is real too. the best results i get aren't from fully autonomous agents, they're from tight human-in-the-loop cycles where i'm steering in real time. the model does the heavy lifting, i do the architectural decisions and error correction. feels more like pair programming than automation.

Precisely. This is why I use Zed and the Zed Agent. It's near-unparalleled for live, mind-meld pair programming with an agent, thanks to CRDTs, DeltaDB, etc. I can elaborate if anyone is interested.

reply

upvote

by ambicapter3 hours ago|

[-]

I am interested.

reply

upvote

by rahabash3 hours ago|

[-]

plz do

reply

upvote

by logicprog3 hours ago|

[-]

The special (or at least new to me) things about Zed (when you use it with the built-in agent, instead of one of the ones available through ACP) basically boil down to the fact that it's a hyper advanced CRDT-based collaborative editor, that's meant for live pair programming in the same file, so it can just treat agents like another collaborator.

1. the diffs from the agent just show up in the regular file you were editing, you're not forced to use a special completion model, or view the changes in a special temporary staging mode or different window.

2. you can continue to edit the exact same source code without accepting or rejecting the changes, even in the same places, and nothing breaks — the diffs still look right, and doing an accept or reject Just Works afterwards.

3. you can accept or reject changes piecemeal, and the model doesn't get confused by this at all and have to go "oh wait, the file was/wasn't changed, let me re-read..." or whatever.

4. Even though you haven't accepted the changes, the model can continue to make new ones, since they're stored as branches in the CRDT, so you can have it iterate on its suggestions before you accept them, without forcing it to start completely over either (it sees the file as if its changes were accepted)

5. Moreover, the actual files on disk are in the state it suggests, meaning you can compile, fuzz, test, run, etc to see what it's proposed changes do before accepting them

6. you can click a follow button and see which files it has open, where it's looking in them, and watch as it edits the text, like you're following a dude in Dwarf Fortress. This means you can very quickly know what it's working on and when, correct it, or hop in to work on the same file it is.

7. It can actually go back and edit the same place multiple times as part of a thinking chain, or even as part of the same edit, which has some pretty cool implications for final code-quality, because of the fact that it can iterate on its suggestion before you accept it, as well as point (9) below

8. It streams its code diffs, instead of hanging and then producing them as a single gigantic tool call. Seeing it edit the text live, instead of having to wait for a final complete diff to come through that you either accept or reject, is a huge boon for iteration time compared to e.g. ClaudeCode, because you can stop and correct it mid way, and also read as it goes so you're more in lockstep with what's happening.

9. Crucially, because the text it's suggesting is actually in the buffer at all times, you can see LSP, tree-sitter, and linter feedback, all inline and live as it writes code; and as soon as it's done an edit, it can see those diagnostics too — so it can actually iterate on what it's doing with feedback before you accept anything, while it is in the process of doing a series of changes, instead of you having to accept the whole diff to see what the LSP says

reply

upvote

by logicprog7 hours ago|

[-]

Aha! A true cybernetics enthusiast. I didn't say that because I didn't want to scare people off ;)

reply

upvote

by drob5184 hours ago|

[-]

That's next-year's problem.

reply

upvote

by fazgha8 hours ago|

[-]

So deep your comment. Asking for a friend, how did you manage to have the em dash — in your keyboard ?

reply

upvote

by throwup2387 hours ago|

[-]

Does your friend have an iPhone? The default iOS keyboard has automatically converted double dashes into an emdash for at least seven years now.

reply

upvote

by QuercusMax4 hours ago|

[-]

I think Google docs does this too, which drives me up the wall when I'm trying to write `command --foo=bar` and it turns it into an M-dash which obviously doesn't work.

reply

upvote

by velcrovan7 hours ago|

[-]

https://joeldueck.com/manually-type-punctuation.html

https://joeldueck.com/ai-is-right-about-em-dashes.html

reply

upvote

by ahofmann7 hours ago|

[-]

Em dashes are used often by LLMs, because humans use them often. On mac keyboards its easily typed. I know this is oversimplifying the situation, but I don't see the usefulness of the constant witch-hunting for allegedly LLM-generated text. For text we are long beyond the point, where we can differenciate between human generated and machine generated. We're even at the point, where it gets somewhat hard to identify machine generated audio and visuals.

reply

upvote

by StilesCrisis5 hours ago|

[-]

I might not be able to spot ALL AI generated text, but I can definitely spot some. It's still kind of quirky.

reply

upvote

by vardalab5 hours ago|

[-]

Yeah, I agree with you. I'm so tired of people complaining about AI-generated text without focusing on the content. Just don't read it if you don't like it. It's another level of when people complain how a website is not readable for them or some CSS rendering is wrong or whatever. How does it add to the discussion?

reply

upvote

by ink8 hours ago|

[-]

On a Mac, it's alt-dash in case you weren't being facetious

reply

upvote

by snazz7 hours ago|

[-]

Extra pedantic: that’s the en dash, the em dash is option-shift-hyphen

reply

upvote

by macintux7 hours ago|

[-]

Technically option-shift-dash. option-dash is an en-dash.

reply

upvote

by vient3 hours ago|

[-]

On Windows it is Alt+0151. Harder to use than on Mac but definitely possible, I frequently use it.

On recent versions Shift+Win+- also work, and Win+- produces en dash.

reply

upvote

by wiredfool3 hours ago|

[-]

I just type -- and jira fixes it.

reply

upvote

by dolebirchwood4 hours ago|

[-]

I really despise that people like you ruined em dashes for the rest of us who have enjoyed using them.

reply

upvote

by bitwize7 hours ago|

[-]

I use Compose - - - on Linux and my cellphone (Unexpected Keyboard). Mac is Alt-_.

reply