upvote
deleted
reply
Data source:

https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...

(Search for “graphwalk”.)

If true, the SWE bench performance looks like a major upgrade.

reply
this seems to be similar to gpt-pro, they just have a very large attention window (which is why it's so expensive to run) true attention window of most models is 8096 tokens.
reply
source on the 8096 tokens number? i'm vaguely aware that some previous models attended more to the beginning and end of conversations which doesn't seem to fit a simple contiguous "attention window" within the greater context but would love to know more
reply
What's the "attention window"? Are you alleging these frontier models use something like SWA? Seems highly unlikely.
reply
[flagged]
reply