A human may learn and improve to avoid being fired, while Claude is incapable of that.
>Re the rest, it's just not my experience that models become incapable of making good decisions in cases where input token count > the context window, but ymmv based on domain.
If they've been trained a lot on your domain (maths, coding) then they can make good decisions. But I've just started using Mythos and even it makes some awful decisions in domains it's not trained on. Of course the majority of decisions are good, but it only takes a couple bad ones to sink a project.