undefined

points

[-]

It's also a great base for experimentation. If you have an idea for an architecture improvement you can try it for $36 on the 20 layer nanocode setting, then for another $200 see how it holds up on the "full scale" nanocode

Kaparthy's notes on improving nanochat [1] are one of my favorite blog-like things to read. Really neat to see which features have how much influence, and how the scaling laws evolve as you improve the architecture

There's also modded-nanogpt which turns the same kind of experimentation into a training speedrun (and maybe loses some rigor on the way) [2]

1 https://github.com/karpathy/nanochat/blob/master/dev/LOG.md

2 https://github.com/kellerjordan/modded-nanogpt