upvote
>I dunno what you mean by "free".

Reality is free. You don't have to waste any resources to model it, you just need to capture it.

>The model is trained on text.

See in my previous reply:

>LLM/AI/AGI/whatever will be

LLMs don't even have a sense of time because they work differently to a human brain.

reply
Vision and audio is already in use in multimodal LLMs. So it's possible in the past.
reply
Who said anything about vision and audio?
reply