undefined

points

[-]

Makes me wonder, as people grow to trust the AI more and more, not reading the code and barely skimming the implementation plans and simply rerolling if something doesn't work, will the value of these chats erode? Thinking back 1-1.5 years I was closely monitoring what these agents did and steering them quite aggressively. These days not so much. Where will RL signals come from when it approaches humans capabilities ever closer? How well does self play work for coding work? What about multistep tasks where it isn't just about being good at a single task, but evolving a codebase over time in the face of changing requirements?

by Schlagbohrer5 days ago|

parent|

[-]

Over a large sample size, simply getting feedback of "Did this work for me, y/n" is valuable even if the specific details are missing and even if the overall tasks are complicated and multifaceted.

by motoboi5 days ago|

parent|

prev|

[-]

Not sure, but in my experience, instead of asking for code, i'm asking for solutions and providing a kubectl configured to reach my cluster and az monitor command to read the logs and telemetry.

A typical session is the agent establishing a metrics and log baseline, creating the code, compiling, deploying, observing, fixing, redeploying, observing metrics, determining the outcome and commiting.

I really, really, don't look at the code anymore.

UPDATE:

so my point is: it won't have my stewarding the code anymore, but it will have the infrastructure (and ultimately the real world) providing feedback on the traces.

by 8n4vidtmkvmk5 days ago|

parent|

prev|

[-]

The only reason I still read the output at my day job is because I still need to send it to another human for review, and I'd be embarrassed and ashamed if I let some slop through. For my hobby projects.. there are definitely parts I don't know how they work.

Maybe we need some form of long-term training. How long does the code that the AI wrote stick around before being rewritten.

I guess we can do this retroactively too if we could somehow tag AI-written lines of code in the VCS, then in a couple years we can check which parts lasted.

by dominotw5 days ago|

prev|

[-]

> There’s nothing much new about the architecture. The real gains come from the usage traces.

sorry. how do you know. i am so curious about where exactly gains are coming from but so hard to even get a little bit of insight.

i wish govt would fund these labs and make it free and opensource. way better investment than stupid overseas wars.

by wyager5 days ago|

parent|

[-]

> i wish govt would fund these labs and make it free and opensource.

It would be impossible for the govt to allocate this much capital towards such a moonshot, and even if they could, they would do it in a way that would get 90% frittered away to fraud and waste

by kurisufag5 days ago|

parent|

[-]

I have excellent news for you. Lux @ ORNL and Equinox @ Argonne are to be completed by EOY, with Solstice (100k NVIDIA chips, currently spec'd to be Vera Rubins) in the next five years.

https://www.whitehouse.gov/presidential-actions/2025/11/laun...

by wyager4 days ago|

parent|

[-]

> Solstice (100k NVIDIA chips, currently spec'd to be Vera Rubins) in the next five years

Is this supposed to be impressive? Five years for the equivalent of, what, Colossus 1? What a joke

by kurisufag4 days ago|

parent|

[-]

It's certainly large enough for trillion-param frontier-tier trainings, which will likely result in capable open-weight models, the thing you just wished for.

by runtime_terror4 days ago|

parent|

prev|

[-]

Lemme guess, Nick Shirley is your favorite journalist?

by komali25 days ago|

parent|

prev|

[-]

What makes you so sure? There's been massively successful government funded and run projects before. Soviets beat the Americans to space, after all.

by wyager4 days ago|

parent|

[-]

The entire US lunar effort cost only $330B in current USD, commensurate with the amount AI companies have raised on private markets alone, and there was also a cold war

by komali24 days ago|

parent|

[-]

I'm not sure I understand your point, sorry. What do you mean?

by palmotea4 days ago|

parent|

prev|

[-]

> What makes you so sure?

Doctrine and propaganda can make someone that sure, and the thing they're sure about doesn't even have to be true.

> There's been massively successful government funded and run projects before. Soviets beat the Americans to space, after all.

Don't let facts get in the way of ideology!

Also the Americans subsequently beating the Soviets to the moon was the government literally allocating huge amounts of capital towards the literal trope-namer moonshot.

by palmotea4 days ago|

parent|

prev|

[-]

> It would be impossible for the govt to allocate this much capital towards such a moonshot...

You have a false definition of "impossible." It would be true to say it could be challenging, given current political dysfunction, but it's not impossible.

> ...and even if they could, they would do it in a way that would get 90% frittered away to fraud and waste

Same with private business.

I'd prefer government funding, because there a greater number of important goals than the two or three the market is capable of optimizing for.

by wolvesechoes4 days ago|

parent|

prev|

[-]

> impossible for the govt to allocate this much capital towards such a moonshot

Oh, how funny.

by illiac7863 days ago|

prev|

[-]

I thought that these stupid captchas where you teach some AI to recognize fire hydrants without getting paid was rock bottom, but no, you can actually pay a lot of money to train AI. Business is amazing.