upvote
Makes me wonder, as people grow to trust the AI more and more, not reading the code and barely skimming the implementation plans and simply rerolling if something doesn't work, will the value of these chats erode? Thinking back 1-1.5 years I was closely monitoring what these agents did and steering them quite aggressively. These days not so much. Where will RL signals come from when it approaches humans capabilities ever closer? How well does self play work for coding work? What about multistep tasks where it isn't just about being good at a single task, but evolving a codebase over time in the face of changing requirements?
reply
Over a large sample size, simply getting feedback of "Did this work for me, y/n" is valuable even if the specific details are missing and even if the overall tasks are complicated and multifaceted.
reply
Not sure, but in my experience, instead of asking for code, i'm asking for solutions and providing a kubectl configured to reach my cluster and az monitor command to read the logs and telemetry.

A typical session is the agent establishing a metrics and log baseline, creating the code, compiling, deploying, observing, fixing, redeploying, observing metrics, determining the outcome and commiting.

I really, really, don't look at the code anymore.

UPDATE:

so my point is: it won't have my stewarding the code anymore, but it will have the infrastructure (and ultimately the real world) providing feedback on the traces.

reply
The only reason I still read the output at my day job is because I still need to send it to another human for review, and I'd be embarrassed and ashamed if I let some slop through. For my hobby projects.. there are definitely parts I don't know how they work.

Maybe we need some form of long-term training. How long does the code that the AI wrote stick around before being rewritten.

I guess we can do this retroactively too if we could somehow tag AI-written lines of code in the VCS, then in a couple years we can check which parts lasted.

reply
> There’s nothing much new about the architecture. The real gains come from the usage traces.

sorry. how do you know. i am so curious about where exactly gains are coming from but so hard to even get a little bit of insight.

i wish govt would fund these labs and make it free and opensource. way better investment than stupid overseas wars.

reply
> i wish govt would fund these labs and make it free and opensource.

It would be impossible for the govt to allocate this much capital towards such a moonshot, and even if they could, they would do it in a way that would get 90% frittered away to fraud and waste

reply
I have excellent news for you. Lux @ ORNL and Equinox @ Argonne are to be completed by EOY, with Solstice (100k NVIDIA chips, currently spec'd to be Vera Rubins) in the next five years.

https://www.whitehouse.gov/presidential-actions/2025/11/laun...

reply
> Solstice (100k NVIDIA chips, currently spec'd to be Vera Rubins) in the next five years

Is this supposed to be impressive? Five years for the equivalent of, what, Colossus 1? What a joke

reply
It's certainly large enough for trillion-param frontier-tier trainings, which will likely result in capable open-weight models, the thing you just wished for.
reply
Lemme guess, Nick Shirley is your favorite journalist?
reply
What makes you so sure? There's been massively successful government funded and run projects before. Soviets beat the Americans to space, after all.
reply
The entire US lunar effort cost only $330B in current USD, commensurate with the amount AI companies have raised on private markets alone, and there was also a cold war
reply
I'm not sure I understand your point, sorry. What do you mean?
reply
> What makes you so sure?

Doctrine and propaganda can make someone that sure, and the thing they're sure about doesn't even have to be true.

> There's been massively successful government funded and run projects before. Soviets beat the Americans to space, after all.

Don't let facts get in the way of ideology!

Also the Americans subsequently beating the Soviets to the moon was the government literally allocating huge amounts of capital towards the literal trope-namer moonshot.

reply
> It would be impossible for the govt to allocate this much capital towards such a moonshot...

You have a false definition of "impossible." It would be true to say it could be challenging, given current political dysfunction, but it's not impossible.

> ...and even if they could, they would do it in a way that would get 90% frittered away to fraud and waste

Same with private business.

I'd prefer government funding, because there a greater number of important goals than the two or three the market is capable of optimizing for.

reply
> impossible for the govt to allocate this much capital towards such a moonshot

Oh, how funny.

reply
I thought that these stupid captchas where you teach some AI to recognize fire hydrants without getting paid was rock bottom, but no, you can actually pay a lot of money to train AI. Business is amazing.
reply