undefined

points

by cbg03 hours ago |

comments

by 2 hours ago|

[-]

deleted

by metadat3 hours ago|

prev|

[-]

Data source:

https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...

(Search for “graphwalk”.)

If true, the SWE bench performance looks like a major upgrade.

by himata41133 hours ago|

prev|

[-]

this seems to be similar to gpt-pro, they just have a very large attention window (which is why it's so expensive to run) true attention window of most models is 8096 tokens.

by appcustodian21 hours ago|

parent|

[-]

source on the 8096 tokens number? i'm vaguely aware that some previous models attended more to the beginning and end of conversations which doesn't seem to fit a simple contiguous "attention window" within the greater context but would love to know more

by thegeomaster2 hours ago|

parent|

prev|

[-]

What's the "attention window"? Are you alleging these frontier models use something like SWA? Seems highly unlikely.

by frog4373 hours ago|

prev|

[-]

[flagged]