upvote
Chain of Thought was kind of an obvious solution that everybody knew was necessary by the time chatgpt / gpt4 came out. It was just a matter of time that frontier labs actually shipped it.

MoE was also pretty straightforward, just a bit surprising how well it worked (that you can get away with just 1/32 active parameters), but most researchers would have come up with it on their own probably.

The true ground breaking papers are the first two you mentioned (transformers and gpt2), and InstructGPT was also very surprising that it worked so well.

reply