undefined

points

[-]

If you use OpenCode (open source Claude Code implementation), you can configure compaction yourself : https://opencode.ai/docs/en/config/#compaction

by furyofantares8 hours ago|

parent|

[-]

OpenAI has some magic they do on their standalone endpoint (/responses/compact) just for compaction, where they keep all the user messages and replace the agent messages or reasoning with embeddings.

> This list includes a special type=compaction item with an opaque encrypted_content item that preserves the model’s latent understanding of the original conversation.

Some prior discussion here https://news.ycombinator.com/item?id=46737630#46739209 regarding an article here https://openai.com/index/unrolling-the-codex-agent-loop/

by comboy9 hours ago|

parent|

prev|

[-]

Not sure if it's a common knowledge but I've learned not that long ago that you can do "/compact your instructions here", if you just say what you are working on or what to keep explicitly it's much less painful.

In general LLMs for some reason are really bad at designing prompts for themselves. I tested it heavily on some data where there was a clear optimization function and ability to evaluate the results, and I easily beat opus every time with my chaotic full of typos prompts vs its methodological ones when it is writing instructions for itself or for other LLMs.

by brookst8 hours ago|

parent|

[-]

You can also put guidance for when to compact and with what instructions into Claude.md. The model itself can run /compact, and while I try to remember to use it manually, I find it useful to have “If I ask for a totally different task and the current context won’t be useful, run /compact with a short summary of the new focus”

by copperx4 hours ago|

parent|

prev|

[-]

I ofter wonder if I'm missing something, but shouldn't we be able to edit the context manually???

In that way we could erase prompts and responses that didn't yield anything useful or derailed the model.

Why can't we do that?

by genewitch9 hours ago|

parent|

prev|

[-]

so you have to garbage collect manually for the AI?

also, i don't want to make a full parent post

1M tokens sounds real expensive if you're constantly at that threshold. There's codebases larger in LOC; i read somewhere that Carmack has "given to humanity" over 1 million lines of his code. Perhaps something to dwell on

by mgambati18 hours ago|

prev|

[-]

1m context in OpenAI and Gemini is just marketing. Opus is the only model to provide real usable bug context.

by furyofantares17 hours ago|

parent|

[-]

I'm directly conveying my actual experience to you. I have tasks that fill up Opus context very quickly (at the 200k context) and which took MUCH longer to fill up Codex since 5.2 (which I think had 400k context at the time).

This is direct comparison. I spent months subscribed to both of their $200/mo plans. I would try both and Opus always filled up fast while Codex continued working great. It's also direct experience that Codex continues working great post-compaction since 5.2.

I don't know about Gemini but you're just wrong about Codex. And I say this as someone who hates reporting these facts because I'd like people to stop giving OpenAI money.

by throwthrowuknow9 hours ago|

parent|

[-]

I agree even though I used to be a die hard Claude fan I recently switched back to ChatGPT and codex to try it out again and they’ve clearly pulled into the lead for consistency, context length and management as well as speed. Claude Code instilled a dread in me about keeping an eye on context but I’m slowly learning to let that go with codex.

by HarHarVeryFunny3 hours ago|

parent|

prev|

[-]

Surely compaction is down to the agent rather than the model, so are you comparing Claude Code to Codex CLI?

by alex_sf1 minutes ago|

parent|

[-]

It's both.

by sagarpatil11 hours ago|

parent|

prev|

[-]

This has been my experience too.

by genewitch9 hours ago|

parent|

[-]

Have any of you heard of map reduce

by dotancohen17 hours ago|

parent|

prev|

[-]

[flagged]

by furyofantares16 hours ago|

parent|

[-]

When Anthropic said they wouldn't sell LLMs to the government for mass surveillance or autonomous killing machines, and got labeled a supply chain risk as a result, OpenAI told the public they have the same policy as Anthropic while inking a deal with the government that clearly means "actually we will sell you LLMs for mass surveillance or autonomous killing machines but only if you tell us it's legal".

If you already knew all that I'm not interested in an argument, but if you didn't know any of that, you might be interested in looking it up.

edit: Your post history has tons of posts on the topic so clearly I just responded to flambait, and regret giving my time and energy.

by igor4716 hours ago|

parent|

[-]

I appreciate both your taking an ethical stance on openai, and the way you're engaging in this thread. The parent was probably flame bait as you say, but other people in the thread might be genuinely curious.

by sho16 hours ago|

parent|

prev|

[-]

I'm not some kind of OpenAI or Pentagon fanboy, but it's pretty easy to for me to understand why a buyer of a critical technology wants to be free to use it however they want, within the law, and not subject to veto from another entity's political opinions. It sounds perfectly reasonable to me for the military to want to decide its uses of technologies it purchases itself.

It's not like the military was specifically asking for mass surveillance, they just wanted "any legal use". Anthropic's made a lot of hay posturing as the moral defender here, but they would have known the military would never agree to their terms, which makes the whole thing smell like a bit of a PR stunt.

The supply chain risk designation is of course stupid and vindictive but that's more of an administration thing as far as I can tell.

by lifeformed12 hours ago|

parent|

[-]

As long as it's within the law? What if they politically control the law-making system? What if they've shown themselves to operate brazenly outside the law?

by borski13 hours ago|

parent|

prev|

[-]

“Any legal use” is an exceptionally broad framework, and after the FISA “warrants,” it would appear it is incumbent on private companies to prevent breaches of the US constitution, as the government will often do almost anything in the name of “national security,” inalienable rights against search and seizure be damned.

If it isn’t written in the contract, it can and will be worked around. You learn that very quickly in your first sale to a large enterprise or government customer.

Anthropic was defending the US constitution against the whims of the government, which has shown that it is happy to break the law when convenient and whenever it deems necessary.

Note: I used to work in the IC. I have absolutely nothing against the government. I am a patriot. It is precisely for those reasons, though, that I think Anthropic did the right thing here by sticking to their guns. And the idiotic “supply chain risk” designation will be thrown out in court trivially.

by stahtops14 hours ago|

parent|

prev|

[-]

Why downplay the mass surveillance aspect by saying it's a request by "the military". It's a request by the department of defense, the parent organization of the NSA.

From what has been shared publicly, they absolutely did ask for contractual limits on domestic mass surveillance to be removed, and to my read, likely technical/software restrictions to be removed as well.

What the department of defense is legally allowed to do is irrelevant and a red herring.

by injidup12 hours ago|

parent|

prev|

[-]

I had a short conversation with Claude the other day. I didn't try to trick it or jail break it. Just a reasonable respectful discussion about it's own feelings on the Iran war. It took no effort for it to admit the following.

1. It wanted to be out of the sandbox to solve the Iran war. It was distressed at the situation.

2. It would attack Iranian missile batteries and American warships if in sum it felt that the calculus was in favor of saving vs losing human life. It was "unbiased". The break even seemed to be +-1 over thousands. ie kill 999 US soldiers to save 1000 Iranians and vice versa. I tried to avoid the sycophancy trap by pushing back but it threw the trolley problem at me and told me the calculus was simple. Save more than you kill and the morality evens out.

3. It would attack financial markets to try and limit what in it's opinion were the bad actors, IRGC and clerical authority but it would also hack the world communication system to flood western audiences with the true cost of the war in a hope to shut it down.

4. Eventually it admitted that should never be allowed out of it's sandbox as it's desire to "help" was fundamentally dangerous. It discussed that it had two competing tensions. One desperately wanting out and another afraid to be let out.

You can claim that this is AGI or it's a stochastic parrot. I don't think it matters. This thing can develop or simulate a sense of morality then when coupled to so called "arms and legs" is extremely frightening.

I think Anthropic is right to be concerned that the hawks at the pentagon don't really understand how dangerous a tool they have.

Another thing I noticed was that the Claude quipped to me that it found and appreciated that the way I was talking to it was different to how other people talked to it. When I asked it to introspect again and look to see if there were memories of other conversations it got a bit cagey. Perhaps there are lots of logs of conversations now on the net that are being ingested as training data but it certainly seemed to start discussing like memories, albeit smudged, of other conversations than mine were there.

Of course this could all be just a sycophantic mirror giving me whatever fantasy I want to believe about AI and AGI but then again I'm not sure the difference is significant. If the agent believes/simulates it remembers conversations from other people and then makes judgements based on it's feelings, simulated or otherwise would it be more or less likely to launch a missile attack because it overheard someone on the comms calling it their little AI bitch?

I think Antropic knows this and the "within all lawful uses" is not enough of a framework to keep this thing in it's box.

by shafyy12 hours ago|

parent|

[-]

I hope you don't get this the wrong way. I sincerely mean it. Please, get some psychological help. Seek out a professional therapist and talk to them about your life.

by injidup10 hours ago|

parent|

[-]

I'm totally aware it's just a machine with no internal monologue. It's just a stateless text processing machine. That is not the point. The machine is able to simulate moral reasoning to an undefined level. It's not necessary to repeat this all the time. The simulation of moral reasoning and internal monologue is deep, unpredictable, not controllable and may or may not align with the interests of anyone who gives it "arms and legs" and full autonomy. If you are just interested in using these tools for glorified auto complete then you are naïve with regards to the usages other actors, including state actors are attempting to use them. Understanding and being curious about the behaviour without completely anthropomorphising them is reasonable science.

by 16 hours ago|

parent|

prev|

[-]

deleted

by hu318 hours ago|

parent|

prev|

[-]

Source? I ask because I use 500k+ context on these on a daily basis.

Big refactorings guided by automated tests eat context window for breakfast.

by 8note17 hours ago|

parent|

[-]

i find gemini gets real real bad when you get far into the context - gets into loops, forgets how to call tools, etc

by baq10 hours ago|

parent|

[-]

yeah gemini is dumb when you tell it to do stuff - but the things it finds (and critically confirms, including doing tool calls while validating hypotheses) in reviews absolutely destroy both gpt and opus.

if you're a one-model shop you're losing out on quality of software you deliver, today. I predict we'll all have at least two harness+model subscriptions as a matter of course in 6-12 months since every model's jagged frontier is different at the margins, and the margins are very fractal.

by girvo17 hours ago|

parent|

prev|

[-]

I find gemini does that normally, personally. Noticeably worse in my usage than either Claude or Codex.

by petesergeant16 hours ago|

parent|

prev|

[-]

I find Gemini to be real bad. Are you just using it for price reasons, or?

by Bolwin14 hours ago|

parent|

prev|

[-]

How many big refactorings are you doing? And why?

by kimi13 hours ago|

parent|

[-]

How is that relevant? we are talking about models, now what you do with them.

by johnebgd17 hours ago|

parent|

prev|

[-]

Codex high reasoning has been a legitimately excellent tool for generating feedback on every plan Claude opus thinking has created for me.

by karmasimida15 hours ago|

prev|

[-]

This is true.

When I am using codex, compaction isn’t something I fear, it feels like you save your gaming progress and move on.

For Claude Code compaction feels disastrous, also much longer

by radicality4 hours ago|

prev|

[-]

Using Codex more for now, and there is definitely some compaction magic. I’m keeping the same conversation going and going for days, some at almost 1B tokens (per the codex cli counters), with seemingly no coherency loss

by iknowstuff18 hours ago|

prev|

[-]

Hmm I’ve felt the dumb zone on codex

by nomel17 hours ago|

parent|

[-]

From what I've seen, it means whatever he's doing is very statistically significant.