upvote
source on the 8096 tokens number? i'm vaguely aware that some previous models attended more to the beginning and end of conversations which doesn't seem to fit a simple contiguous "attention window" within the greater context but would love to know more
reply
What's the "attention window"? Are you alleging these frontier models use something like SWA? Seems highly unlikely.
reply