undefined

points

[-]

I got your point because it seemed that you were precisely avoiding the anthropomorphizing and in fact seemed to be honing in on whats happening with the weights. The only way I can imagine these models are going to work with trick questions lies beyond word prediction or reinforcement training UNLESS reinforcement training is from a complete (as possible) world simulation including as much mechanics as possible and let these neural networks train on that.

Like for instance, think chess engines with AI, they can train themselves simply by playing many many games, the "world simulation" with those is the classic chess engine architecture but it uses the positional weights produced by the neural network, so says gemini anyways:

"ai chess engine architecture"

"Modern AI chess engines (e.g., Lc0, Stockfish) use a hybrid architecture combining deep neural networks for positional evaluation with advanced search algorithms like Monte-Carlo Tree Search (MCTS) or alpha-beta pruning. They feature three core components: a neural network (often CNN-based) that analyzes board patterns (matrices) to evaluate positions, a search engine that explores move possibilities, and a Universal Chess Interface (UCI) for communication."

So with no model of the world to play with, I'm thinking the chatbot llms can only go with probabilities or what matches the prompt best in the crazy dimensional thing that goes on inside the neural networks. If it had access to a simple world of cars and car washes, it could run a simulation and rank it appropriately, and also could possibly infer through either simulation or training from those simulations that if you are washing a car, the operation will fail if the car is not present. I really like this car wash trick question lol