One observation is that the LLM is a next token predictor but if you train it on the internet/textbooks/etc you get a predictor of that--- but that isn't the behavior we actually want. None of these sources tend to contain "Solve this problem for me. OK, here is the solution:".
It wasn't physically impossible to start GNU the other way around, by bashing machine code into a system until you had a working operating system. But doing so would have been a lot less reasonable-- much more expensive, making progress much less quickly, etc.