Why couldn't a machine that identifies relations between tokens be AGI? You're imposing an arbitrary constraint. It is either generally intelligent or its not, whether it uses tokens or whatever else is irrelevant.
Also, languages made up of tokens are still languages, in fact most academics would argue all languages are made up of tokens.
Anyway, it's not LLM's that achieve AGI, it's systems built around LLM's that achieved AGI quite some time ago.