Does this result suggest that if we had N clever humans manually building an LLM, they might come up with something as smart as a frontier model, but potentially 45 times smaller? (1644 / 36 ~= 45, N = very large, time not specified)
True, but with even smarter humans, you could exploit the interactions for additional calculations.
While it sounds a bit silly, it is one of the hypotheses behind a fast takeoff. An AI that is sufficiently smart could design a network better than a trained one and could make something much smarter than itself on the same hardware. The question then becomes if that new smarter one can do an even better job. I suspect diminishing returns, but then again I am insufficiently smart.
(I see the Trained Weights results now, thanks.)