It should give you an idea of how hard it is to do a SOTA model from scratch!
If you relax the SOTA aspect, Karpathy's nanochat has you covered: https://github.com/karpathy/nanochat
https://huggingface.co/spaces/HuggingFaceTB/smol-training-pl...
rare detailed insight on the entire process