upvote
I had an interesting experience recently where I ran Opus 4.6 against a problem that o4-mini had previously convinced me wasn't tractable... and Opus 4.6 found me a great solution. https://github.com/simonw/sqlite-chronicle/issues/20

This inspired me to point the latest models at a bunch of my older projects, resulting in a flurry of fixes and unblocks.

reply
I have a codebase (personal project) and every time there is a new Claude Opus model I get it to do a full code review. Never had any breakages in last couple of model updates. Worried one day it just generates a binary and deletes all the code.
reply
No version control?
reply
I was being facetious, I mean one day models might skip the middle man of code and compilation and take your specs and produce an ultra efficent binary.
reply
Musk was saying that recently but I don't see it being efficient or worthwhile to do this. I could be proven brutally wrong, but code is language; executables aren't. There's also no real reason to bother with this when we have quick-compiling languages.

More realistically, I could see particular languages and frameworks proving out to be more well-designed and apt for AI code creation; for instance, I was always too lazy to use a strongly-typed language, preferring Ruby for the joy of writing in it (obsessing about types is for a particular kind of nerd that I've never wanted to be). But now with AI, everything's better with strong types in the loop, since reasoning about everything is arguably easier and the compiler provides stronger guarantees about what's happening. Similarly, we could see other linguistic constructs come to the forefront because of what they allow when the cost of implementation drops to zero.

reply
You can map tokens to CPU instructions and train a model on that, that's what they do for input images I think.

I think the main limitation on the current models is not that cpu instructions aren't cpu instructions (even though they can be with .asm), it's that they are causal, the cpu would need to generate a binary entirely from start to finish sequentially.

If we learned something over the last 50 years of programming is that that's hard and that's why we invented programming languages? Why would it be simpler to just generate the machine code, sure maybe an LLM to application can exist, but my money is in that there will be a whole toolchain in the middle, and it will probably be the same old toolchain that we are using currently, an OS, probably Linux.

Isn't it more common that stuff builds on the existing infra instead of a super duper revolution that doesn't use the previous tech stack? It's much easier to add onto rather than start from scratch.

reply
Those CPU instructions still need to be making calls out to things, though. Hallucinated source code will reveal its flaws through linters, compiler errors, test suites. A hallucinated binary will not reveal its flaws until it segfaults.
reply
Programs that pass linters, compile and test suites can still segfault. A good test harness that test the binary comprehensively can limit this. The model could be trained to have patterns of efficient assembly it uses rather than source code.
reply
From the project description here for your sqlite-chronicle project:

> Use triggers to track when rows in a SQLite table were updated or deleted

Just a note in case its interesting to anyone, sqlite compatible Turso database has CDC, a changes table! https://turso.tech/blog/introducing-change-data-capture-in-t...

reply
This may seem obvious, but many people overlook it. The effect is especially clear when using an AI music model. For example, in Suno AI you can remaster an older AI generated track with a newer model. I do this with all my songs whenever a new model is released. It makes it super easy to see the improvements that were made to the models over time.
reply
I continue to get great value out of having claude and codex bound together in a loop: https://github.com/pjlsergeant/moarcode
reply
They are one, the ring and the dark lord
reply
I keep giving the top Anthropic, Google and OpenAI models problems.

They come up with passable solutions and are good for getting juices flowing and giving you a start on a codebase, but they are far from building "entire software products" unless you really don't care about quality and attention to detail.

reply
Yeah I keep maintaining a specific app I built with gpt 5.1 codex max with that exact model because it continues to work for the requests I send it, and attempts with other models even 5.2 or 5.3 codex seemed to have odd results. If I were superstitious I would say it’s almost like the model that wrote the code likes to work on the code better. Perhaps there’s something about the structure it created though that it finds easier to understand…
reply
> It feels like we are now able to manage incredibly smart engineers for a month at the price of a good sushi dinner.

In my experience it’s more like idiot savant engineers. Still remarkable.

reply
Its like getting access to an amazing engineer, but you get a new individual engineer each prompt, not one consistent mind.
reply
Sushy dinner? What are you building with AI, a calculator?
reply
I have long suspected that a large part of people's distaste for given models comes from their comfort with their daily driver.

Which I guess feeds back to prompting still being critical for getting the most out of a model (outside of subjective stylistic traits the models have in their outputs).

reply
"These models are so powerful."

Careful.

Gemini simply, as of 3.0, isn't in the same class for work.

We'll see in a week or two if it really is any good.

Bravo to those who are willing to give up their time to test for Google to see if the model is really there.

(history says it won't be. Ant and OAI really are the only two in this race ATM).

reply