upvote
I tried Gemini Pro 2.5 with a lot of context: all the documentation for a system and several papers of interest, then asking it to use the system tom implement the proposed solution in the papers. The total context was over 500k words, so with usual estimates probably over 700k tokens.

The answers started out ok, but fairly quickly it seemed to loose track of the mid-stuff in the documentation, insisting on using one concept instead of another even when I explicitly told it not to. Full attention on 1M context is not really feasible (I don't believe that Google actually stores upwards of 1T of data just for my query), and there are various ways LLMs use selective attention. I'm not sure if Google has published anything on how they do it?

reply